Parallel Domain is building the world’s most advanced simulation and digital twin platform for autonomy, robotics, and computer vision. Our Replica product creates large-scale, photorealistic digital twins of real-world environments used for testing, validation, and development of autonomous systems.
We are hiring a Machine Learning Data Engineer responsible for building and scaling the data pipelines that support Replica and ML model development. You will ensure that data flows efficiently from raw customer inputs through validated, structured formats suitable for training, evaluation, and production systems.
Own data ingestion: Build reliable pipelines to normalize and validate customer and synthetic data.
Define data standards: Implement tools for dataset filtering, versioning, and annotation support.
Generate high-quality data feeds for training and evaluation across ML models.
Data engineering experience: Proven experience building scalable data pipelines and tooling.
~ Understanding of how data is used in model training and evaluation.
~Practical experience with 3D concepts, geometry, and the linear algebra principles underpinning computer vision (e.g., Strong Python proficiency and comfort with large datasets.
~ Experience working closely with ML engineers on data needs.
MS or PhD in ML, computer vision, robotics, or related field.
Robotics data knowledge: Experience handling camera, lidar, or radar data
Familiarity with data visualization systems like Foxglove, Rerun, or Voxel51
A dynamic and supportive work environment where your ideas are valued.
If you're passionate about machine learning, 3D reconstruction, generative AI, and the future of autonomous systems, we'd love to hear from you.