What We Do
Site Reliability Engineering
Our approach to Site Reliability Engineering (SRE) is based on a solid foundation in line with most SRE practices but also specifically focuses on delivering value to our clients.
Its aim is to engineer the operations of the services we provide so that automation is the primary tool for initiating state changes and to re-establish balance when something unexpected happens.
We build the processes to support this by focusing on five foundational pillars:
Everyone is empowered to work across our full client engagement and to contribute at all levels.
Metrics, monitoring, and alerting form the nervous system that SRE relies upon. Integration at the earliest stage is fundamental to our approach.
We ladder up efficiency, moving tasks from manual to procedural and then finally to fully automated.
As we change the state of the systems we operate, we need to be vigilant to unforeseen impacts and possess the agility to respond quickly. This is achieved by carefully balancing capacity and effort.
Robust processes harden and remove fragility from client systems. We achieve this by making testing and strengthening the service a key focus for SRE.