Master's thesis: localisation of safety neurones in text2image models

Darmstadt

Fraunhofer-Gesellschaft

Model

Inserat online seit: 17 März

Aufgaben der Stelle

The Fraunhofer Institute for Secure Information Technology SIT is one of the leading research and development institutions for cybersecurity in Germany and Europe and is part of ATHENE, the national research center for applied cybersecurity. ATHENE is a collaboration between the Fraunhofer Society, TU Darmstadt, Hochschule Darmstadt, and Goethe University Frankfurt. Our common goal: to make the world of tomorrow safer.

Be part of change

LLMs have gained significant attention recently due to their remarkable capabilities. A similar, yet less explored field focuses on text-to-image (T2I) model architectures. Since a picture is worth more than a thousand words, one can assume that T2I models possess at least as many capabilities as text-to-text models (T2T). Therefore, it is important to subject these models to an "alignment" process. In T2T models, some neurones in the architecture are repurposed as safety neurones. To better understand the alignment process, the locations of safety neurones in both architectures should be compared.

Objective: The work aims to investigate the localisations of safety neurones in T2I models and compare them with T2T models. For this purpose, the proposed approaches for the localisation of safety neurones from T2T models will be adapted and implemented for T2I models. Finally, results regarding the localisation and significance of the neurones are analysed to highlight similarities and differences in the architectures.

Results: The results of this work aim to conceptualise the differences in model architectures for LLM security research. Since T2I models receive less attention, it is important to motivate further research on these models. For this purpose, a comprehensive comparison of different architectures and their associated safety neurones will be conducted as part of the work. The results are verified by a targeted deactivation of the identified neurones and then compared with randomly deactivated neurones.

What you do with us:

Researching and implementing novel machine learning approaches that enhance the security of LLMs
Self-critical evaluation of the obtained results
Presenting the results
Preparing a project report in the form of a master's thesis

What you contribute

Knowledge in the field of Machine Learning, including training, inference, and optimisation of transformer architectures
Knowledge in the field of ML security is desirable.
Good Python skills, especially with Pytorch Scientific interest and interest in current research projects

What we offer

Independent work schedule management
Insights into the intersection of academic research and industrial application

Related works:

(Relevante Abschnitte 2.2, 2.7, 3.1, 3.2)

We value and promote the diversity of our employees' skills and therefore welcome all applications – regardless of age, gender, nationality, ethnic and social origin, religion, ideology, disability, sexual orientation and identity. Severely disabled persons are given preference in the event of equal suitability. Our tasks are diverse and adaptable – for applicants with disabilities, we work together to find solutions that best promote their abilities.

With its focus on developing key technologies that are vital for the future and enabling the commercial utilization of this work by business and industry, Fraunhofer plays a central role in the innovation process. As a pioneer and catalyst for groundbreaking developments and scientific excellence, Fraunhofer helps shape society now and in the future.

Ready for a change? Then apply now and make a difference! Once we have received your online application, you will receive an automatic confirmation of receipt. We will then get back to you as soon as possible and let you know what happens next.

Fraunhofer Institute for Secure Information Technology SIT

Requisition Number: 81219Application Deadline:

Bewerben

E-Mail Alert anlegen

Speichern

Ähnliches Angebot

Master's thesis: explainability of transformer models in authorship verification

Darmstadt

Fraunhofer-Gesellschaft

Model

Ähnliches Angebot

Masterand (m/w/d) nachhaltigkeit in model-based systems engineering

Darmstadt

em engineering methods AG

Model

Ähnliches Angebot

Master's thesis: explainability of transformer models in authorship verification (darmstadt, de, 64295)

Darmstadt

Fraunhofer-Gesellschaft

Model