You'll build and own the cloud infrastructure that Causa Prima runs on — from CI/CD pipelines to production monitoring, security hardening, and cost optimization. You'll set the foundation for a platform handling sensitive financial data at scale.
What you'll do
* GCP infrastructure — Cloud Run (API), GKE Autopilot (agents, GPU nodes), VPC with private subnets, dedicated Cloud SQL instances, IAM, Secret Manager. Infrastructure-as-code.
* CI/CD — GitHub Actions + Cloud Build, security-aware pipeline design, production approval gates, container image scanning, secret isolation, signed commits.
* Observability — OpenTelemetry distributed tracing across TypeScript and Python services, Cloud Monitoring, Sentry with PII-stripping hooks, structured logging with sanitization, per-agent behavioural monitoring, tiered alerting.
* Secret management & rotation — Credential lifecycle for LLM API keys, database credentials, OAuth tokens, and agent signing keys in GCP Secret Manager.
* Container orchestration — Docker builds, registry management, GKE cluster configuration. Design the path toward Kubernetes-native deployment as we scale.
* Incident response infrastructure — Per-agent circuit breakers, graceful degradation, tiered alerting (logged → Slack → PagerDuty), forensic tooling via event store replay and traces.
* Network security — VPC firewall rules, private ingress for all data stores, egress controls, PII Vault on restricted-access infrastructure.
* Neo4j Aura operations — Monitoring, scaling decisions, and backup verification for the managed graph database.
What we're looking for
* 5+ years in DevOps, infrastructure, or SRE roles for production systems.
* Strong systems design skills — you think in deployment topologies, failure domains, blast radius, and operational security.
* Production experience with GCP (Cloud Run, GKE, Cloud SQL, IAM, Secret Manager) or equivalent cloud platform with willingness to go deep on GCP.
* Hands-on experience with Kubernetes in production — cluster management, networking, scaling, security policies.
* Experience with infrastructure-as-code: Terraform, Pulumi, Ansible, or similar. Ideally more than one.
* Experience designing CI/CD pipelines with security in mind — secret isolation, approval gates, image scanning, deployment strategies.
* Experience with observability systems — distributed tracing, structured logging, alerting hierarchies, dashboarding.
* Security awareness at the infrastructure level — you think about network isolation, least-privilege IAM, and credential hygiene as defaults.
* Strong code review skills for infrastructure-as-code and deployment configuration.
* Nice to have:
o Event streaming infrastructure (Kurrent, Redpanda, Kafka).
o SOC 2 or GDPR compliance from an infrastructure perspective.
o Fintech or regulated-environment background.