Your Responsibilities:
1. Develop and own the infrastructure and platform roadmap, evaluate and recommend technologies and architectural patterns to support infrastructure scalability, cost efficiency, and agility
2. Define comprehensive technical specifications, with strong emphasis on non-functional requirements: security (zero-trust, encryption, policy-as-code, compliance frameworks), performance (latency, throughput, high availability, disaster recovery), reliability, observability, and cost optimization
3. Champion Infrastructure as Code (IaC) and GitOps practices using tools such as Terraform/OpenTofu, Pulumi, Helm, ArgoCD/Flux, and Ansible
4. Act as a technical leader and mentor to platform engineers, SREs, DevOps teams, and application developers. Foster strong collaboration between infrastructure/platform teams and product/engineering groups
5. Produce high-quality, living documentation: architecture decision records (ADRs), design blueprints, runbooks, standards & guardrails, and platform user guides; following best practices
6. Proactively identify, assess, and mitigate infrastructure risks, including security vulnerabilities, single points of failure, compliance gaps, operational drift, and cost overruns
Your Qualifications:
7. 10+ years in infrastructure engineering, platform engineering, SRE, or architecture roles, with5+ years focused on production-scale Kubernetes and cloud platforms
8. Deep expertise in Kubernetes (architecture, operators, CNI/service mesh, multi-cluster federation, upgrades at scale)
9. Strong hands-on experience with major cloud providers (AWS, Azure) and hybrid/on-premiss integrations
10. Proficiency in IaC (Terraform preferred), GitOps, containerization (Docker), CI/CD tooling, and observability (Prometheus, Grafana, ELK/OpenTelemetry)
11. Solid understanding of databases (relational, NoSQL, distributed), networking (VPC, SDN,load balancing, firewalls), and security best practices (RBAC, secrets management, encryption, vulnerability scanning)
12. Proven track record defining non-functional requirements and delivering high-availability, performant, secure systems
13. Excellent communication and leadership skills; experience mentoring teams and influencing stakeholders