Job Title: AI Red Teaming & LLM Quality Assurance Specialist
Type: Freelance | Remote
Language Requirement: German speakers only
Engagement: 25–30 hours per week
Job Description
We are seeking highly analytical and detail-oriented professionals with experience in Red Teaming, Prompt Evaluation, and AI/LLM Quality Assurance. The selected candidate will play a key role in testing and evaluating AI-generated content to identify vulnerabilities, assess risks, and ensure compliance with safety, ethical, and quality standards.
Key Responsibilities
* Conduct Red Teaming exercises to identify adversarial, harmful, or unsafe outputs from large language models (LLMs).
* Evaluate and stress-test AI prompts across multiple domains (e.g., finance, healthcare, security).
* Develop and apply test cases to assess accuracy, bias, toxicity, hallucinations, and misuse potential.
* Collaborate with researchers and engineers to report risks and suggest mitigations.
* Perform manual QA and content validation across model versions, ensuring accuracy, coherence, and adherence to guidelines.
* Create evaluation frameworks and scoring rubrics for prompt performance and safety compliance.
* Document findings, edge cases, and vulnerability reports with clarity and structure.
Requirements
* Proven experience in AI red teaming, LLM safety testing, or adversarial prompt design.
* Familiarity with prompt engineering, NLP tasks, and ethical considerations in generative AI.
* Strong background in Quality Assurance, content review, or test case development.
* Understanding of LLM behaviors, failure modes, and evaluation metrics.
* Excellent critical thinking, writing, and problem-solving skills.
* Ability to work independently and meet deadlines.
Preferred Qualifications
* Prior work with OpenAI, Anthropic, Google DeepMind, or other LLM safety initiatives.
* Experience in risk assessment, red team security testing, or AI policy & governance.
* Background in linguistics, psychology, or computational ethics is a plus.