Salary: ? - ? € per year Requirements: 3 years of software engineering experience Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools Deep understanding of software architecture, design, development, debugging, and code quality/review assessment Excellent oral and written communication skills for clear, structured evaluation rationales Candidates must be based out of US, Canada, or WEU countries (UK, Netherlands, Italy, Germany, …) Responsibilities: Curate code examples, build solutions, and correct code for AI model training in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go Evaluate and refine AI-generated code to ensure efficiency, scalability, and reliability Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks Build agents that can verify the quality of code and identify error patterns Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operational maintenance) and evaluate model capabilities on them Design verification mechanisms that can automatically verify a solution to a software engineering task Technologies: AI API C# FastAPI Java JavaScript LLM Python Rust More: - Role: Remote Software Developer (LLM Evaluation) - Salary: 200 - 300 USD per HOUR - Tech stack: Python, JavaScript, C++, C#, Java, FastAPI - Category: Python Developer / Engineer - Location address: 548 Market Street, PMB 18282, San Francisco, United States - Benefits & perks: Fully home office / remote work; Flexible work time - About Us: Based in San Francisco, California, Turing is the worlds leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers by accelerating frontier research with high-quality data, advanced training pipelines, and top AI researchers specializing in software engineering, logical reasoning, STEM, multilinguality, multimodality, and agents; and by helping enterprises transform AI from proof of concept into proprietary intelligence that performs reliably and delivers measurable impact on the P&L. - Project Overview: Create cutting-edge datasets for training, benchmarking, and advancing large language models; curate code examples; provide precise solutions and corrections across multiple languages; evaluate and refine AI-generated code; collaborate with researchers and cross-functional teams to enhance enterprise-level AI-driven coding solutions. - Engagement Details: Commitment flexible, minimum 10 hrs/week up to 40 hrs/week (partial PST overlap required); Type: Contractor (no medical/paid leave); Duration: 1 month (starting next week; potential extensions based on performance and fit) - View this job and over 500 other transparent jobs with salaries & tech stacks on DevITJobs - Are you looking for Python jobs in San-Francisco? last updated 24 week of 2026