PlateWise
is a Denmark- and Germany-based startup developing digital solutions that make the sustainability of food procurement in professional kitchens measurable and transparent. We are looking for a motivated
Master's student in Data Science / Machine Learning / Natural Language Processing
, currently
enrolled at a German university
, who wants to take on a real-world industry challenge — and turn it into a Master thesis with immediate practical impact.
The Challenge
You will design and evaluate an AI-driven document processing pipeline
for procurement data, which arrives in two formats:
* Semi-structured Excel files with varying schemas and layouts
* Unstructured PDF supplier invoices in
German or Danish
The focus will be on
OCR, NLP, schema detection, and data normalization
, transforming heterogeneous inputs into a standardized, machine-readable structure for our platform. As most of our procurement data is
Danish
, proficiency in Danish is a strong bonus — but not a strict requirement.
What you will work on
* Building and testing an applied
AI pipeline
using OCR and NLP (e.g., LayoutLM, Donut)
* Handling
multilingual data extraction
(German/Danish)
* Tackling technical challenges in
layout variability, field mapping, and data validation
* Delivering results that will directly shape the PlateWise platform and support sustainability in the food service sector
Why this matters
This project is not just academic — it directly helps professional kitchens understand and reduce their environmental footprint. You'll combine cutting-edge research in document intelligence
with a tangible impact on sustainability.
Details
* Start:
Immediate
* Duration: To be completed within 2025
* Location: Remote-first (Germany or Denmark)
* Enrollment: Must be enrolled at a German university
* Compensation: Possible (mini-job basis)
If you're excited about applied AI, multilingual NLP, and contributing to sustainablel food system, we'd love to hear from you
Academic Background
* Enrolled in a
German university
(requirement)
* Master's program in
Computer Science, Data Science, Artificial Intelligence, Machine Learning, or related field
Technical Skills
* Python
(essential, as most OCR/NLP frameworks are Python-based)
* Familiarity with
OCR libraries
(e.g., Tesseract, PaddleOCR, EasyOCR)
* Experience with
NLP/Document AI models
(e.g., Hugging Face Transformers, LayoutLM, Donut, spaCy)
* Knowledge of
data wrangling & schema alignment
(e.g., pandas, PySpark, or database experience)
* Understanding of
machine learning workflows
(training, evaluation, error analysis)
* Bonus: familiarity with
cloud-based ML pipelines
(e.g., AWS, GCP, Azure, or open-source orchestration tools)
Language & Domain Knowledge
* Strong English skills (working language)
* Danish proficiency is a strong plus
(data is primarily Danish)
* German helpful but not essential
* Interest and knowledge about food (systems) are strong plus
Research & Analytical Skills
* Ability to
review academic literature
in OCR/NLP and apply findings
* Comfortable with
experiment design & evaluation
(accuracy, robustness, computational efficiency)
* Curious mindset for tackling
multilingual, unstructured data problems
Soft Skills
* Self-driven and proactive (it's partly independent research)
* Structured, analytical approach
* Passion for
sustainability and applied AI
* Communicates findings clearly (both in writing and presentations)