We are seeking a Site Reliability Engineer (SRE) to join an innovative and fast-growing company in Belfast. This role focuses on ensuring the reliability, scalability, and performance of critical infrastructure and services while working with cutting-edge cloud-native technologies. You'll collaborate with engineering teams to enhance system resilience, streamline deployments, and drive automation.
What You'll Do
- Design, build, and optimise resilient, high-performance infrastructure.
- Develop and maintain CI/CD pipelines to enhance deployment efficiency.
- Implement cloud-native and open-source solutions to improve reliability and scalability.
- Proactively monitor and troubleshoot production systems, ensuring uptime and performance.
- Automate infrastructure provisioning and configuration using Infrastructure as Code (IaC).
- Drive continuous improvements in developer experience (DevEx) and operational efficiency.
- Participate in incident response, root cause analysis, and post-mortem reviews.
What You'll Need
- 3+ years in an SRE, DevOps, or Infrastructure Engineering role in a high-scale environment.
- Strong experience with Kubernetes and container orchestration.
- Deep knowledge of cloud platforms and distributed systems.
- Proficiency with databases and messaging systems (e.g., Elasticsearch, Postgres, Neo4j, RabbitMQ, Redis).
- Expertise in CI/CD tooling (e.g., GitHub Actions, ArgoCD).
- Hands-on experience with Infrastructure as Code (IaC) using Terraform, Terragrunt, or CDK.
- Solid scripting and automation skills (Python, Bash, or Go).
- A proactive, problem-solving mindset with a focus on reliability, automation, and scalability.
If you're passionate about reliability, automation, and cloud-native technologies, we'd love to hear from you. Apply now or reach out to Andrew Harrison for more details.