Jobbeschreibung
Staff Research Engineer (LLM Pre-Training)
We value engineers who:
* Can plan projects and make decisions independently, consulting with others if needed.
* Identify customer needs and prioritize their tasks accordingly.
* Start with the simplest solutions and gradually add complexity as needed.
* Take sole responsibility for an entire subsystem.
* Have a passion for learning and a desire to stay up to date with the latest developments in the LLM field.
In this role, you will:
* Work with stakeholders to convert business requirements into technical specifications.
* Train LLMs from scratch on a large GPU cluster.
* Collect and process pre-training and fine-tuning datasets.
* Support and improve existing subsystems.
How we develop JetBrains AI:
* A cluster of hundreds of NVIDIA GPUs as training infrastructure.
* Git for source control management.
* Python, PyTorch, and HuggingFace as an ML stack.
* Kubeflow and Weights & Biases for experiment tracking.
* TeamCity as a CI Automation system.
Voraussetzungen
We’ll be happy to have you on our team if you have:
* Experience in design, deployment, and support of production ML systems.
* A strong theoretical background in NLP and transformer-based approaches.
* Proficiency with modern deep learning frameworks such as PyTorch and common libraries for NLP.
* Experience in distributed training of multi-billion parameter models.
* Attention to detail in everything you do and great communication skills.
We’d be especially thrilled if you have experience with:
* LLM inference frameworks such as vLLM, DeepSpeed, TensorRT.
* LLM alignment techniques such as RLHF/RLAIF.
* MLOps tools and practices, including CI/CD for ML.
* K8s and Kubeflow.
* Scientific publications in the NLP field.
Wir bieten Ihnen