About EMBO EMBO is a not-for-profit organization dedicated to promoting excellence in the life sciences in Europe and beyond. EMBO currently comprises a community of more than 2,100 EMBO Members. We fund talented researchers at all career stages, facilitate scientific exchanges through high-quality academic publishing, conferences, and lectures, and foster a research environment where scientists can achieve their best work. Our diverse international team of approximately 60 staff collaborates closely with one another and with external stakeholders to further our mission. EMBO is located on the international EMBL life sciences research campus in Heidelberg, Germany. Your role We seek a trainee with expertise in data science, machine learning and data visualization to develop AI-assisted tools to establish vector-based representation of the expertise of researchers and analy s e them at scale in the context of the global scientific landscape. The trainee will work with developers from the Open Science Implementation team and in collaboration with the Membership & Elections team to deliver focused analyses. They will report to the Head of Open Science Implementation and the Head of Membership and Elections. We want to derive data-driven insights into the composition of our membership and other relevant scientific communities (authors, reviewers, conference speakers, grant applic ants etc. ) in terms of their areas of expertise as compared to the global life science landscape. To this end, w e are developing AI-assisted tools to represent the scientific expertise of scientists, journals and conferences. The tools currently in development use NLP techniques, including tailor ed large language model embeddings, automated keyword a nd concept extraction, a nd graph-based data analysis. The end goal of the project is shar ing the result s o f t he analys e s with our community through advanced visualization methods. You have Experience in data science s or machine learning, o r you have or are about to have a degree or certified training in a closely related field in compu t ational science s ; Experience in structured, object-oriented programming in Python ; Experience with multi-dimensional data processing, analytics and visualization (for example, NumPy, SciPy, scikit-learn, … ); Knowledge in text processing (spacy, nltk ) and some of the major transformer-based frameworks ( HuggingFace, LangChain, LlamaIndex, OpenAI or Anthropic APIs ) ; The ability to work both autonomously and as part of a team. You may also have Contributed to open source projects; Experience with vector stores; Experience in training or fine-tuning models with a major machine learning; framework (PyTorch, TensorFlow); Prior exposure to life sciences, computational biology or bioinformatics. Contract length 6 months, renewable to 12 months total Stipend 1500 euros / month