Why We Are Hiring for This Role: To build and maintain robust perception pipelines for our humanoid robots. To integrate state-of-the-art vision models with real-time control and decision-making. To ensure reliable scene understanding in diverse, real-world environments. To develop fast and accurate pipelines for visual grasping, navigation, and human-robot interaction. To support continuous learning from real-world deployment data. To develop VLMs What Kind of Person We Are Looking For: Strong background in computer vision, with a focus on deep learning-based methods. Hands-on experience with modern vision architectures (e.g. ViTs, SAM, DINO, Diffusion, Mask2Former, etc.). Proficient with visual SLAM, 6D pose estimation, object detection, and segmentation. Comfortable working with real-time video data on robots in uncontrolled environments. Experience integrating vision with action and control systems in robotics. Fluent in Python and PyTorch; C++ is a plus. Familiarity with ROS2, NVIDIA Isaac, or similar robotics frameworks. Can train, fine-tune, and evaluate large vision models, including with custom datasets. Understands sensor calibration, synchronisation, and multi-camera setups. Collaborates well with AI, control, and hardware teams to close the loop from perception to action. Bachelor’s or Master’s degree in Computer Vision, Robotics, or Machine Learning. PhD is a strong plus. Benefits We provide market standard benefits (health, vision, dental, 401k, etc.). Join us for the culture and the mission, not for the benefits. Salary The annual compensation is expected to be between $80,000 - $1,000,000. Exact compensation may vary based on skills, experience, and location.