Your Hundertserver mission: As a Site Reliability Engineer (SRE) at Hundertserver, you are responsible for the stable, high-performing, and secure operation of modern cloud platforms. Through automation, monitoring, SLAs, and incident response, you ensure that our systems not only run – but continuously improve. You work closely with customers, development, and infrastructure teams, bring clarity to complex operational issues, and create sustainable solutions – hands-on, pragmatic, and with a high degree of ownership. The Main Tasks: Key Responsibilities Availability & Stability • Ensuring platform availability according to defined SLOs / SLAs • Analyzing and resolving incidents & performance issues (including on-call duties) • Building and maintaining robust alerting, logging, and monitoring setups • Root cause analysis & implementation of preventive measures Automation & Infrastructure • Automating provisioning, scaling, and maintenance (IaC with Terraform, Ansible, etc.) • Operating and enhancing Kubernetes environments (cloud & on-prem) • Developing and maintaining self-healing and auto-scaling mechanisms • Creating and maintaining runbooks & playbooks Monitoring, Observability & Performance • End-to-end monitoring with tools like Prometheus, Grafana, Loki, ELK • Setting up and managing SLIs and SLOs – data-driven platform control • Performing performance analyses (workloads, traffic, databases) and ongoing optimization • Setting up & maintaining distributed tracing and logging systems Security & Operational Hygiene • Implementing and enforcing security standards (least privilege, TLS, secrets management) • Regular health checks, updates, and patching • Ensuring availability through established backup & disaster recovery processes Collaboration & Consulting • Close collaboration with development, support, and platform teams • Consulting customers on operating models, platform metrics & architectural decisions • Training internal teams on topics such as monitoring, SRE basics & troubleshooting You fit to our team when: What You Should Bring Technical Profile • Linux expertise (Debian, Ubuntu, RHEL) • Deep knowledge of Kubernetes – clusters, ingress, operators, Helm, etc. • Experience with cloud platforms (AWS, Azure, GCP) • Strong expertise in monitoring stacks (Prometheus, Grafana, Loki, ELK) • Proficiency in Infrastructure-as-Code (Terraform, Ansible, Puppet) • Scripting and automation skills (Bash, Python, Go) • Familiarity with logging, tracing & incident management processes Soft Skills & Working Style • Proactive troubleshooting & high quality awareness • Structured, analytical thinking – solution-oriented and pragmatic • Excellent communication skills (with customers, developers, and operations) • Focus on sustainability & automation rather than firefighting • Willingness to participate in on-call rotations (standby, SLA windows) Nice to Have • Certifications such as CKA / CKS / AWS DevOps or equivalent • Experience with GitOps, ArgoCD, or Policy-as-Code • Knowledge of FinOps / cost optimization in cloud platforms What we offer: What You Can Expect at Hundertserver • Real development – in technology, methodology & culture • Modern platforms & tools – with room for your own ideas • Ownership & trust – we work in partnership, not through hierarchy • Flexible working hours & a remote-first culture • Hands-on mentality & direct customer impact About us ONEHUNDRED / Hundertserver is the cloud service provider that doesn’t just support digital transformation – we actively shape it. Based in the heart of Berlin and trusted by clients such as Gründerszene, Edelman, and Prognos, we develop innovative, secure, and sovereign cloud solutions for a connected future. Our team lives and breathes technology, thrives on challenges, and is always pushing the boundaries of what cloud can do. With over 20 years of experience, deep open-source expertise, and a strong focus on data sovereignty, efficiency, and quality, we guide organizations on their journey into the multi-cloud world. What defines us? Integrity, team spirit, a passion for learning, and the courage to break new ground. We’re open, agile, and driven by progress – and we’re looking for people who share that mindset. Join our team and help shape the future of cloud with us.