Site Reliability Engineer
About the Role
We're seeking a highly skilled Site Reliability Engineer to join our Platform Engineering domain in AI Platform team.
The mission of our Platform Engineering is to provide trusted, performant, self-service platforms that empower product teams to build innovative solutions.
As one of the pioneers in cloud-based technology, our security, resilience, and productivity standards require not only modern infrastructure but also building high-performing teams that support our customers.
Key Responsibilities:
* Contribute to designing and developing platform components for machine learning and generative AI use cases across the company.
* Take ownership of reliable deployment, maintenance, monitoring, and incident response for our services.
* Write high-quality, maintainable code and ensure our platform solutions are well-documented and testable.
* Collaborate with ML Engineers, SREs, and other Platform teams to ensure operability and maintainability of AI capabilities offered across the company.
Requirements:
* Proven experience specifically in infrastructure and reliability engineering, including deployment automation, monitoring, incident management, and performance tuning.
* Solid programming skills, ideally in Python, Go, or TypeScript, and experience writing production-grade code.
* Familiarity with cloud-native development and AWS infrastructure, including some experience with services like SageMaker or other AI/ML-related services.
* Experience with Kubernetes, CI/CD pipelines (e.g., ArgoCD, GitHub Actions), Infrastructure-as-Code tools (e.g., Terraform), and containerization (Docker).
* Working knowledge of networking, security, and compliance best practices in production environments.
Nice to Have:
* Exposure to MLOps practices or working with Data Science/Machine Learning teams.
* Familiarity with prompt-based or LLM-driven GenAI workflows.
Traits:
* You take pride in writing clean, reliable, and well-tested code.
* You're a proactive team player who communicates openly and supports others.
* Comfortable working in a cross-functional environment, with a focus on practical impact.
,