Vortrag: Can mechanistic interpretability help us understand brain?

Version 1.2

Overview of the field & (very modest) initial results

Recent studies show that LLMs can actually predict patterns of brain activity when people read or listen to language. Same findings have been discovered for Vision and Multimodal models.

But here’s the catch: we still don’t really know which parts of these models make them “brain-like.” Meanwhile, the field of Mechanistic Interpretability has made some advances in reverse-engineering LLMs to figure out how they process different tasks internally.

In my thesis, I broadly aim to explore how these two areas — cognitive neuroscience and interpretability — can benefit from each other. I identify circuits inside LLMs that are responsible for different NLP task - like Sentiment Analysis or Gender Agreement - and then see how/if manipulating these circuits changes model's alignment with different brain regions. If a causal relationship of these interventions and brain alignment is established, it could help us draw more fine-grained hypotheses about language processing in the brain and alignment with the models.

In this talk, I’ll give a quick tour of the field, share (very small!) results of my work, and talk about which steps I aim to try next.

Info

Tag: 14.11.2025
Anfangszeit: 11:30
Dauer: 00:30
Raum: M2.31
Track: Computerlinguistik
Sprache: en

Programm 78. Studentische Tagung Sprachwissenschaft