Jobs Career Advice Post Job
X

Send this job to a friend

X

Did you notice an error or suspect this job is scam? Tell us.

  • Posted: Oct 28, 2025
    Deadline: Jan 30, 2026
    • @gmail.com
    • @yahoo.com
    • @outlook.com
  • Old Mutual Limited (OML) is a premium African financial services group that offers a broad spectrum of financial solutions to retail and corporate customers across key markets in 14 countries.


    Read more about this company

     

    OM Bank - Site Reliability Engineer

    Job Description

    • OM Bank is currently looking for a site reliability engineer to join OM Bank platform team. The candidate will be responsible for maintaining the OM Bank platform, including first line support for the platform’s technical services and managing service outages through the incident management process.

    KEY RESULT AREAS

    • First line support for all services that comprise the platform 
    • Managing the incident management process for production incidents including detection, triaging, resolve and driving continuous improvements  
    • Maintain the production readiness score card defined in terraform to ensure checks are working as expected and responsible for adding new checks to the scorecard workflow  
    • Creating and maintaining monitors in datadog that improve observability across the platform  
    • Engagement with the wider OM Bank product and build team to ensure alignment to the observability standards defined by the platform team 
    • Designing and implementing enhancements to the platform that contribute towards reducing MTTR (mean time to recovery)  
    • Designing and implementing automation initiatives including self-service capabilities 
    • Implementing Service Level Indicators & Objectives for the platform  
    • Implementing and maintaining datadog dashboards for the platform  
    • Defining and maintaining baseline monitors to be used by product teams  
    • Maintaining the observability repository that contains all service definitions and observability related configurations  
    • Maintaining the feature flagging repository containing all feature flagging definition for product teams 
    • Maintaining Pager Duty definitions and overall administration  
    • Fine tuning monitors to ensure alerts are triggered appropriately   
    • Leading an action center during a production incident, fostering collaboration across the bank to resolve the outage 
    • Advising product and platform on engineering best practices to ensure services are built with observability and scalability from the start  
    • Maintaining overall platform health by monitoring key metrics  
    • Maintaining and extending the SRE API written in python and deploy to Kubernetes  

    ROLE REQUIREMENTS

    • Bachelor’s degree in computer science, electrical or electronic engineering, Information Technology, or relevant field 
    • 7+ years of software and platform engineering experience building and supporting scalable services  
    • 3-5 years experience in writing infrastructure as code (Terraform, AWS CDK, Cloudformation)  
    • Solid experience using observability platforms like Datadog 
    • Experience with microservices architecture and Restful API  
    • Solid Kubernetes experiencing displaying end to end deployment and maintenance of clusters including designing and building infrastructure as code required to deploy the cluster and required cloud resources that support the cluster  
    • Experience with Kubernetes custom resource management and deployment  
    • Solid experiencing deploying Kubernetes resources using Helm Charts  
    • Experience in fine tuning Kubernetes HPA configs  
    • Moderate experience using go/python programming language  
    • Solid experience using GitOps and general git based operations  
    • Solid infrastructure as code background displaying experience in designing, implementing and maintaining IAC design patterns that manage large scale cloud environment.  
    • Solid AWS experience, displaying advanced understanding of cloud architecture and maintaining distributed systems 
    • Experience maintaining queuing systems like AWS SQS and event streaming platforms like Kafka 
    • Experience supporting mobile applications  

    Skills

    • Action Planning, Application Development, Business Process Design, Computer Literacy, Data Management, Data Modeling, Evaluating Information, Identifying Customer Needs, Information Technology (IT) Support, Market Analysis, Oral Communications, Product Development, Technical Support, Technical Troubleshooting, Test Case Management, User Requirements Documentation, Web Development

    Competencies

    • Business Insight
    • Collaborates
    • Courage
    • Cultivates Innovation
    • Decision Quality
    • Drives Results
    • Ensures Accountability
    • Manages Complexity

    Closing Date

    • 01 November 2025

    Check how your CV aligns with this job

    Method of Application

    Interested and qualified? Go to Old Mutual on oldmutual.wd3.myworkdayjobs.com to apply

    Build your CV for free. Download in different templates.

  • Send your application

    View All Vacancies at Old Mutual Back To Home

Subscribe to Job Alert

 

Join our happy subscribers

 
 
Send your application through

GmailGmail YahoomailYahoomail