Inserat online seit: 18 Juni
Aufgaben der Stelle
Curious about powering next-generation machine learning and AI workloads?
Join a, next‑generation cloud platform focused on giving AI builders and machine learning teams powerful, production‑grade compute resources and infrastructure without the barriers of traditional providers. The organisation delivers on‑demand GPU compute, clusters, serverless inference, and scalable environments designed for complex AI‑driven workloads while upholding strong standards around performance, sustainability, and developer simplicity. Benefit from a true automation first culture, the ability to shape tooling and operational standards in an early stage platform, and hands on exposure to high performance AI infrastructure at scale.
Step into a role driving AI and cloud innovation, apply today!
Responsibilities: Design and build automation for Linux based GPU clusters
Write scripts and tooling in Bash and Python
Improve system reliability, monitoring and incident response
Support AI training environments using Kubernetes, Slurm and Docker
Act as a point of contact during incidents and drive resolution
Identify and automate manual operational processes
Work closely with infrastructure and hardware teams
Contribute to future platform evolution including serverless compute
Skills/Must have: Strong experience as an SRE or similar role
Deep knowledge of Linux systems and operations
Strong scripting or coding skills in Bash and Python
Experience with Kubernetes, Docker and cluster level tooling
Understanding of HPC or AI workloads and multi node training
Experience in high availability, high pressure environments
Automation first mindset with interest in using AI tools
Salary: Up to €130,000 gross per year
#J-18808-Ljbffr