Subscribe to Job Alert
Join our happy subscribers
We design, develop and implement unified software solutions for the financial services industry. Whether you're trading on global financial markets, managing investments, providing mortgages or helping your clients plan their financial future, you can rely on our software, and our team, to deliver real outcomes for your business and your clients. Our ...
Read more about this company
Monitor and maintain the health and performance of production systems and environments.
Investigate, troubleshoot, and resolve complex system and application issues.
Collaborate with product and platform engineering teams to identify root causes and drive permanent fixes.
Implement automation to improve incident response, monitoring, and reporting.
Develop and maintain tools, scripts, and dashboards that enhance system observability.
Participate in incident management and post-incident reviews to ensure continuous improvement.
Contribute to system performance tuning, capacity planning, and reliability improvements.
Document operational procedures, playbooks, and best practices for system reliability.
Support service transition and release activities to ensure smooth deployments.
Identify recurring issues and implement sustainable solutions to reduce technical debt
Product Engineering teams
Service Operations and Application Support teams
Site Reliability Engineering (SRE) and Platform Engineering teams
Client Service and Delivery teams
Information Security and Risk teams
Bachelor’s degree in Computer Science, Information Technology, or a related field (required)
Minimum of 3–5 years’ experience in a production support, DevOps, or reliability engineering role (required)
Proficiency in Windows Server administration (preferred)
Proficiency in Linux Server Administration (desirable)
Knowledge of Networking
Certification in AWS, Azure, or Google Cloud Platform (desirable)
Proficiency in databases, certification in MSSQL SQL (desirable)
Experience with monitoring tools such as Datadog, Prometheus, or Grafana
IT Operations: Maintains operational stability of live services and supports infrastructure and systems.
Incident Management: Manages service incidents and ensures timely resolution.
Problem Management : Identifies root causes and implements corrective actions.
Automation : Develops scripts and tools to automate operational tasks.
Systems Integration and Build : Contributes to integrating systems and maintaining deployment pipelines.
Service Level Management: Monitors and reports on service performance against SLA
Collaborates: Works effectively across teams to solve problems and share knowledge.
Manages Self: Demonstrates accountability, prioritisation, and attention to detail.
Adapts: Responds flexibly to change and operational challenges.
Thinks Analytically: Identifies patterns, root causes, and opportunities for improvement.
Communicates Effectively: Shares information clearly and timely with technical and non-technical stakeholders.
Our Culture & Why You’ll Love Working Here
Check how your CV aligns with this job
Build your CV for free. Download in different templates.
Join our happy subscribers