Global Resources Leader Implements Robust IT Infrastructure with SRE
Site reliability engineering (SRE) is crucial for resources companies to ensure resilience, reliability, efficiency, and scalability of IT infrastructure and operations. Globally, SRE practices are being adopted increasingly in the resources industry to improve observability, enhance predictive maintenance potential, and promote data-driven decision-making. All this contributes to safer and more efficient operations.
Client Brief
The client is one of the largest resources companies in the world. The organization had built remote operation centers and was actively seeking the next generation approach to ensure resilience. They partnered with Infosys to strengthen their infrastructure operations by introducing site reliability engineering (SRE).
Challenges
The client needed to conduct an as-is assessment of their current operations to start with. Several processes were manual, leading to loss of time as well as efficiency and accuracy. There was no dedicated SRE team to carry out the assessment and implement changes based on the outcome. This meant that the time taken for incident resolution was significant.
Infosys Solution
The Infosys team focused on reducing manual toil and enhancing automation, observability, and productivity.
To achieve this, Infosys conducted an assessment of the current state of infrastructure operations. This involved over 30 preparatory meetings, identification of use case candidates, and over 70 recommendations to reduce effort and increase automation. The team identified seven key SRE focus areas for roadmap creation and planning.
The assessment covered areas such as observability, reliability and availability, backup and restore, and people and process. An assessment plan was defined involving all relevant stakeholders and the Infosys SRE Maturity Framework was used to carry out the assessment in line with the plan. Based on the results of the assessment, a one-year roadmap for cost reduction was developed.
Business Benefits
Infosys leveraged the SRE team’s extensive experience and deep expertise to deliver several tangible business benefits. This included the reduction of incidents by over 3000 per month. The incident reduction KPIs were well beyond expected targets as seen in the table below:
Track | Target | Achieved |
---|---|---|
Windows | 13% | 34% |
Linux | 7% | 25% |
Backup | 3% | >50% |
The automation initiative reduced over 6000 hours of manual effort, 20% in patching hours, and 40% in patching schedules. It further achieved significant optimization of operational resources at the end of the first year with the cost savings banked for future transformation spend outlays.
Authored by Sriram Sundar, VP & Business Head, Energy Core, Infosys Limited.