Observability and SRE theatre
A big part of managing complex architectures is being able to understand them. Observability is now a key goal when designing modern technology systems, which are complex and ephemeral, with multiple interdependencies and potential points of failure. As a result, the Observability movement is rapidly gaining traction. This theatre examines the concept in terms of technology and what it means on a business level. SRE or Site Reliability Engineering job titles have become increasingly common in the past couple of years. We will examine what SRE means and hear from companies who have further bridged the gap between developers and IT Ops by successfully adopting this approach to delivering resilient applications at scale.
Subject matter includes: Application Performance Monitoring (APM) – Observability – Root Cause Analysis – Site Reliability Engineering – Customer Experience – User Experience – Incident Management – MTTD – MTTR – Real-Time Monitoring – Log Investigation – Cloud Resource Usage – OpenTelemetry – AIOps