Jobs
Meine Anzeigen
Jobs per E-Mail
Anmelden
Stellenangebote Job Tipps Unternehmen
Suchen

Senior site reliability engineer & incident-manager (m/f/d)

Berlin
emnify
Ingenieur
Inserat online seit: Veröffentlicht vor 23 Std.
Beschreibung

Your Role Are you passionate about observability and resiliency? Is ensuring we know about issues before our customers second nature to you? Is being at the front and orchestrating processes sounds fun to you? emnify is seeking a talented Reliability Engineer & Incident Management Operator to drive the company Incident Management routines, be the authority for everything observability and resiliency, and guide internal stakeholders with best practices. As a part of the larger Engineering department, our Platform team plays a crucial role in enhancing our competitive edge by improving developer experience to increase development efficiency and scale productivity. You will join a team of 3 engineers, fostering empathy and a collaboration mindset to ensure continuous improvement of development experience at emnify. The ideal candidate will have extensive experience with AWS cloud infrastructure, microservices, and modern observability practices as well as strong communication and organizational skills. The position is 35% Incident management operations, 35% Observability and monitoring work, and 30% platform engineering and developer support. Emnify technology radar The position is based in emnify’s office in Berlin. Your Impact: Incident management operations: Lead and optimize the incident management process end-to-end, ensuring timely detection, resolution, and documentation of incidents; coordinating cross-functional teams, conducting post-mortems and root cause analyses, and driving continuous improvements to workflows. Observability and monitoring: Design, implement, and continuously improve observability frameworks by developing dashboards, alerts, metrics, and logging strategies to monitor service health, detect anomalies proactively, support issue resolution, and ensure cost-optimized performance across the platform. Collaboration and Support: Partner with cross-functional teams to implement observability best practices, providing training and guidance on tools while leveraging metrics data to drive engineering priorities. Platform engineering: Leverage AWS to design, build, and maintain a resilient cloud infrastructure, implementing best practices for security, scalability, and cost optimization while ensuring high availability, disaster recovery, and robust platform components such as pipelines, shared infrastructure, and application services. Your Skills: Proven experience as a (Site) Reliability Engineer or similar role in a SaaS and/or telecom company. Hands-on experience with observability tools (e.g., Prometheus, Mimir, Grafana, Loki, CloudWatch, Grafana IRM, Rootly), including setup and optimization of metrics and alerts. Experience in establishing and managing incident management processes. Understanding of incident management frameworks and best practices. Extensive experience with AWS cloud services (e.g., EC2, S3, RDS, Lambda, CloudWatch). Expert skills with modern infrastructure tooling and principles (Kubernetes, IaaC - Terraform, CI/CD - GitHub Actions, Jenkins) Good understanding of modern development tooling and principles (e.g., microservices architecture, 12-factor applications, Docker) Advanced documentation skills for effective knowledge sharing and collaboration. Exceptional problem-solving and critical thinking with a passion for enhancing development experiences in fast-paced tech environments. Ability to work independently and as part of a team. Nice to have: Knowledge of networking protocols and telecom systems Knowledge of secure software development Familiarity with programming languages such as Python, Go, or Java. Certification in AWS (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect)

Bewerben
E-Mail Alert anlegen
Alert aktiviert
Speichern
Speichern
Ähnliches Angebot
Secops engineer - full managed cloud stack (m/w/d)
Berlin
BWI Informationstechnik
Ingenieur
Ähnliches Angebot
Bauingenieur*in (m/w/d) als gruppenleitung bauunterhaltung
Berlin
Freie Universität Berlin
Bauingenieur
Ähnliches Angebot
Ingenieur (diplom, master, bachelor), staatlich geprüfter techniker oder meister (m/w/d) als auditor managementsysteme (m/w/d)
Berlin
VdS
Ingenieur
Mehr Stellenangebote
Ähnliche Angebote
Ingenieur Jobs in Berlin
Jobs Berlin
Jobs Berlin (Kreis)
Jobs Berlin (Bundesland)
Home > Stellenangebote > Ingenieur Jobs > Ingenieur Jobs > Ingenieur Jobs in Berlin > Senior Site Reliability Engineer & Incident-Manager (m/f/d)

Jobijoba

  • Job-Ratgeber
  • Bewertungen Unternehmen

Stellenangebote finden

  • Stellenangebote nach Jobtitel
  • Stellenangebote nach Berufsfeld
  • Stellenangebote nach Firma
  • Stellenangebote nach Ort
  • Stellenangebote nach Stichworten

Kontakt / Partner

  • Kontakt
  • Veröffentlichen Sie Ihre Angebote auf Jobijoba

Impressum - Allgemeine Geschäftsbedingungen - Datenschutzerklärung - Meine Cookies verwalten - Barrierefreiheit: Nicht konform

© 2025 Jobijoba - Alle Rechte vorbehalten

Bewerben
E-Mail Alert anlegen
Alert aktiviert
Speichern
Speichern