We are a team dedicated to engineering excellence, reusable design, and simplicity. We foster a supportive, growth-focused culture where we mentor each other and work together to build resilient, high-quality systems. Experience as a Site Reliability Engineer, DevOps Engineer, or Software Engineer focused on infrastructure in a large-scale distributed environment. Strong software development skills in a language like Swift, Go, or Python, and a high degree of comfort with shell scripting (Bash). Hands-on experience building and managing systems with container orchestration tools (Kubernetes, Docker). Deep understanding of networking (TCP/IP, DNS, HTTP) and experience using observability tools (monitoring, logging, tracing) to diagnose complex issues. Expertise in performance analysis and capacity planning for global, distributed systems. Experience with large-scale distributed databases (e.g., Cassandra, FoundationDB) or messaging systems (e.g., Kafka). Demonstrated ability to lead incident response for high-impact outages. Familiarity with using Generative AI (GenAI) or Large Language Models (LLMs) to accelerate operational tasks, such as automating runbooks, generating scripts, or analyzing incident data.