What you’ll do
We are seeking a highly skilled (Senior) Site Reliability Engineer (SRE) to join our Sovereign Cloud Automation & Tooling (SAT) Team in Berlin. This role is pivotal in ensuring the stability, performance, and scalability of our cloud infrastructure. The ideal candidate will have extensive experience in observability tools, automation, and cloud technologies across AWS and Azure environments.
Your tasks will include:
1. Design, implement, and manage robust observability solutions using tools such as Dynatrace, Grafana, Prometheus, and Site24x7.
2. Develop, maintain, and improve monitoring, alerting, and incident response processes to ensure system reliability and minimize downtime.
3. Collaborate with development teams to enhance application performance, scalability, and reliability through proactive monitoring insights.
4. Manage and optimize cloud infrastructure across AWS and Azure using Infrastructure as Code (IaC) tools like Terraform and Bicep.
5. Develop and maintain CI/CD pipelines to automate software deployments and infrastructure updates.
6. Write and maintain automation scripts in Bash, Shell, and Python to support operational efficiency.
7. Identify performance bottlenecks, implement improvements, and support post-incident reviews to drive continuous improvement.
8. Work closely with security teams to ensure compliance, security best practices, and data protection in cloud environments.
9. Maintain comprehensive documentation for observability configurations, automation processes, and cloud infrastructure standards.
What you bring
10. Proven experience as an SRE, DevOps Engineer, or similar role in cloud environments.
11. Expertise in observability tools such as Dynatrace, Grafana, Prometheus, and Site24x7 for performance monitoring and alerting.
12. Strong proficiency in AWS and Azure cloud services.
13. Hands-on experience with Terraform, CloudFormation, or Bicep for Infrastructure as Code (IaC).
14. Proficient in CI/CD tools such as Jenkins, GitLab CI, or Azure DevOps.
15. Solid scripting skills in Bash, Shell, and Python for automation tasks.
16. Strong troubleshooting skills with a focus on performance tuning and incident management.
17. Experience in securing cloud environments and implementing compliance best practices.
Preferred Qualifications
18. Any certifications such as AWS Certified DevOps Engineer – Professional, AWS Certified Solutions Architect – Professional, Microsoft Certified: Azure DevOps Engineer Expert, or Microsoft Certified: Azure Solutions Architect Expert.
The position can be filled as a part time position.