About the project The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data — terabytes of storage, trillions of records, continuously growing load. Infrastructure: ~100 servers (bare metal VPS) active use of IaC Kubernetes clusters in production focus on stability, observability, and automation The project is long-term — not a hype startup, but a mature product with real users. What the work looks like This is a hands-on role with a clear time allocation: 60% — operations and incidents (including helping teams) 20% — infrastructure automation 20% — prototyping, improvements, technical initiatives There is on-call responsibility, but normally after-hours incidents happen 2–3 times a year, not every week. Responsibilities Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting) Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews) Monitoring, alerting, backups, and regular recovery checks Development of service and infrastructure automation Development of CI/CD and release procedures Incident diagnosis and resolution, support for product teams Traffic analytics, bot and attack protection tools Responsibility for 24/7 platform stability Requirements What’s important 4 years of experience operating Linux/Ubuntu infrastructure and production services Strong understanding of networking and troubleshooting Kubernetes (cluster operations), Rancher, Docker / containerd Hands-on experience with Ansible and Terraform Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry CI/CD: Jenkins Automation: Bash, Python Experience working with LVM Nice to have Experience working with blockchain nodes Diagnosis and tuning of ClickHouse and MongoDB in high-load clusters Providers: Hetzner / OVHcloud Cloudflare (edge, DDoS), experience with AWS Handling abuse tickets with hosting providers Technology stack VPN: WireGuard, OpenVPN Databases: ClickHouse, MongoDB, Redis, PostgreSQL Applications: Node.js (pm2), php-fpm, Lua, Tarantool Supporting services: Go (operatorSDK), Ruby, Node.js, PHP Benefits 5,000 – 8,000 € net Format: office / hybrid / remote Location: Spain (Barcelona and suburbs) or remote (CET ±2) Full-time Opportunity to genuinely influence architecture and processes Mature engineering team and reasonable expectations