Ihre Aufgaben:
* Investigate, develop and apply advanced quantization (8-bit, 4-bit, mixed precision), pruning, and distillation techniques for deriving optimized models for NXP NPU targets
* Accelerate inference performance
* Investigate methodologies for enhancing the performance of small language models towards enabling tiny agents at the edge, while ensuring these follow safety principles
* Deploy optimized models using Ollama, llama.cpp, ONNX Runtime, and TFLite for efficient NPU inference
* Design benchmarking pipelines for assessing the performance of Generative and Agentic AI systems on-device
* Develop demonstrators and proof-of-concepts
* Move key technologies from research into product solutions
Ihre Qualifikationen:
* Solid experience in software/AI engineering with deep exposure to LLMs, VLMs, and systems performance
* Experience with LLM quantization techniques (e.g., SmoothQuant, SpinQuant, QuaRoT), pruning (Wanda, SparseGPT, etc.) and other system optimizations like speculative decoding
* Track-record experience in working with AI frameworks (PyTorch, TensorFlow, etc.), required
* Experience with Agentic AI technologies and familiarity with existing frameworks (e.g., LangChain, Google ADK, SmolAgents, etc.)
* Understanding of AI toolchains, deployment, portability and inference engines (CUDA, TensorRT, TFLite, ONNX, Ollama, etc.) preferred
* Affinity and experience with embedded systems, and NPU accelerators required
* Experience with embedded software architecture, build systems, version control systems required
* Broad experience with Operating systems GNU/Linux, embedded systems, development boards, and processors, and SW competencies required
* Familiarity with setting up and maintaining related ML-Ops development environments (MLFlow, ClearML, etc.) required
* Solid programming experience of C, C++, Python and Bash programming languages on Linux systems required
Ihre Vorteile:
* You will work in an international environment
* A very renowned company
* Interesting tasks in a multinational environment