Jobs
Meine Anzeigen
Jobs per E-Mail
Anmelden
Stellenangebote Job Tipps Unternehmen
Suchen

Freelance agent evaluation engineer

Stuttgart
Mindrift
Ingenieur
Inserat online seit: 28 April
Beschreibung

Please submit your CV in English and indicate your level of English proficiency.

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.

What this opportunity involves

We're building a dataset to evaluate AI coding agents - how well a model handles real-world developer tasks.

You'll create challenging tasks and evaluation criteria within realistic simulated environments:

* Build realistic developer environments - a virtual company with codebase, infrastructure, and context (tickets, docs, conversations) that forms a believable development history
* Design tasks from intermediate states of these environments - craft the prompt, define what "solved" means, and ensure the task is solvable by an AI agent
* Write tests that verify agent solutions - accept all valid approaches and reject incorrect ones, neither too strict nor too lenient
* Iterate on tasks and tests based on QA feedback - review agent solutions, analyze failures, and refine until the evaluation is fair and robust

What this is NOT

* Not data labeling
* Not prompt engineering
* Not writing code from scratch - the agent writes most of the code; you guide and evaluate

What we look for

* 5+ years in software development
* Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis
* Experience writing tests (functional, integration)
* English proficiency - B2+

Why this is hard

Frontier models are already good at coding. Creating a task that genuinely challenges the best models is non-trivial. You need to deeply understand where models fail and what scenarios reveal the difference between a good and a bad solution. Tasks have many valid solutions - writing tests that accept all correct solutions and reject incorrect ones is harder than it sounds.

How it works

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid

Effort estimate

Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.

Compensation

Up to $50/hr equivalent, depending on level and pace. Tasks are estimated at ~20 hours each; you set your own schedule.

Bewerben
E-Mail Alert anlegen
Alert aktiviert
Speichern
Speichern
Ähnliches Angebot
Bauingenieur als projektleiter (m/w/d)
Stuttgart
Köster
Bauingenieur
Ähnliches Angebot
Senior projektingenieur:in bahnübergänge
Stuttgart
Deutsche Bahn AG
Project Engineer
Ähnliches Angebot
Spezialist:in ingenieur:in vertrags- und nachtragsmanagement
Stuttgart
Deutsche Bahn AG
Ingenieur
Mehr Stellenangebote
Ähnliche Angebote
Ingenieur Jobs in Stuttgart
Jobs Stuttgart
Jobs Stuttgart (Kreis)
Jobs Baden-Württemberg
Home > Stellenangebote > Ingenieur Jobs > Ingenieur Jobs > Ingenieur Jobs in Stuttgart > Freelance Agent Evaluation Engineer

Jobijoba

  • Job-Ratgeber
  • Bewertungen Unternehmen

Stellenangebote finden

  • Stellenangebote nach Jobtitel
  • Stellenangebote nach Berufsfeld
  • Stellenangebote nach Firma
  • Stellenangebote nach Ort
  • Stellenangebote nach Stichworten

Kontakt / Partner

  • Kontakt
  • Veröffentlichen Sie Ihre Angebote auf Jobijoba

Impressum - Allgemeine Geschäftsbedingungen - Datenschutzerklärung - Meine Cookies verwalten - Barrierefreiheit: Nicht konform

© 2026 Jobijoba - Alle Rechte vorbehalten

Bewerben
E-Mail Alert anlegen
Alert aktiviert
Speichern
Speichern