About the Role As an ML Engineer at SEMRON, you will be responsible for developing the training infrastructure that enables models to run efficiently on our novel analog in-memory compute platform. A core part of this work is designing a geo-distributed Quantization-Aware Training (QAT) framework that allows the machine learning community to collectively contribute compute resources, enabling them to quantize their favorite models and make them compatible with SEMRON’s hardware. What you will do: Design and implement a geo-distributed QAT system for preparing models for analog inference Build collaborative training and tooling infrastructure that allows users to contribute GPUs and quantize models together Translate new quantization and analog-aware training methods into robust engineering components Collaborate with researchers to integrate their algorithms into a production-grade pipeline What you should bring in: A Master’s degree, PhD, or a personal project or open-source contribution that clearly demonstrates strong engineering skills and a solid understanding of ML tooling Fluency in Python and PyTorch Understanding of training workflows and practical experience with model optimization Ability to architect and maintain scalable systems, with clean code and clear interfaces A collaborative mindset and comfort working across software, research, and hardware domains Helpful but not required: Prior experience with QAT or other model compression techniques Familiarity with projects like DiLoCo, SWARM, or other decentralized or peer-to-peer learning systems Background in compiler stacks, graph transformations, or model deployment tooling Contributions to open-source ML infrastructure or research software About us At SEMRON, we’re redefining what’s possible in AI hardware. Our core innovation lies in analog in-memory computing for deep neural network acceleration, enabling us to build compute architectures that scale vertically into the third dimension, much like NAND flash revolutionized memory. This leap in physical density allows us to deploy models with billions of parameters on chip areas as small as a few square millimeters.