Westhouse is one of the leading international recruitment agencies for the procurement of highly qualified experts in fields such as IT lifecycle management, SAP, engineering, commerce and specialist consultancy.
For our client we are currently looking for a Senior Monitoring and Observability Engineer (m/f/d) - St. Leon-Rot / remote.
Your tasks
1. Design, implement, and manage Prometheus-based monitoring with custom alerting and configurations.
2. Deploy and optimise Thanos for centralised, scalable, long-term metric storage and federation.
3. Configure core Thanos components (Sidecar, Querier, Store Gateway, Compactor, Ruler) for HA and retention.
4. Build and maintain Grafana dashboards for dynamic visualisation and reporting.
5. Configure optimised SNMP jobs for diverse network environments.
6. Onboard new metrics and develop data pipelines for collection and storage.
7. Integrate Prometheus–Thanos environments for global queries, deduplication, and scalability.
8. Implement Streaming Telemetry and OpenTelemetry for real-time monitoring and distributed tracing.
9. Design and maintain Kubernetes-based observability infrastructure for reliability and automation.
10. Manage Prometheus, Thanos, and Grafana deployments via Helm and ArgoCD.
11. Develop CI/CD pipelines (GitHub, Jenkins) and GitOps workflows with ArgoCD for automated deployments.
12. Collaborate with DevOps teams to improve observability via CI/CD, Helm charts, and operators.
13. Troubleshoot monitoring issues and optimise performance across dashboards and data flows.
14. Maintain documentation, procedures, and provide training for teams and stakeholders.