AWS Site Reliability Engineer ( Data Platform)

601525 Posted: 20/02/2026 03:42:03

£450 - £455 per day
City of London, England
Permanent

Our client, a prominent organisation in the technology sector, is currently seeking an AWS Site Reliability Engineer (SRE) to support and scale a cloud-native data platform built on AWS, Snowflake, and Databricks. This role focuses on enhancing reliability through automation, disaster recovery testing, resiliency engineering, observability, and proactive SLO/SLI/SLA management.

Key Responsibilities:

Design, build, and maintain automation for infrastructure provisioning, platform operations, and incident response using IaC and CI/CD.
Lead resiliency and disaster recovery planning, including regular DR drills, failure testing, and recovery validation across AWS and data platform components.
Define, implement, and manage SLIs, SLOs, and SLAs for critical data pipelines and platform services; utilise error budgets to guide reliability improvements.
Build and operate robust observability solutions (metrics, logs, traces, alerts) for AWS services, Snowflake, and Databricks workloads.
Partner with data engineering and platform teams to embed reliability-by-design into architecture and delivery practices.
Perform root cause analysis (RCA) and drive continuous improvement to reduce toil and enhance platform availability and performance.
Own and drive resolution of incidents and service requests raised by consumer teams, providing operational support and automating fixes to improve reliability and user experience.

Job Requirements:

Practical knowledge of SRE principles, including SLO/SLI/SLA design and error budgets.
Strong experience with AWS (e.g., EC2, S3, IAM, VPC, CloudWatch) in production environments.
Experience with observability tools and monitoring/alerting best practices.
Hands-on experience with automation and IaC (Terraform, CloudFormation, CDK) and scripting (Python, Bash).
Exposure to data platforms such as Snowflake and/or Databricks.

Nice to Have:

Experience running DR tests, chaos engineering, or resiliency testing in cloud environments.
Familiarity with CI/CD pipelines and GitOps practices.
Background supporting large-scale data or analytics platforms.

If you have the expertise and passion for cloud-native data platforms and are ready to take on new challenges in a dynamic contract role, we would love to hear from you. Apply now to join our client's innovative team.

Harry Stayman Associate Consultant

Apply for this role

First name

Last name

Telephone Number

Email address

CV Upload

Choose file

Message

By submitting this form you agree to our Terms & Conditions, Privacy Policy & Cookie Policy.

AWS Site Reliability Engineer ( Data Platform)

Key Responsibilities:

Job Requirements:

Nice to Have:

Apply for this role

Still Looking? What about.....

Quick Links

About us

Services

Contracting with us

Policies & Guidance

Useful Links