Events
Past Event
WED@NICO SEMINAR: Asma Ghandeharioun, Google DeepMind "Model Interpretability: from Illusions to Opportunities"
Northwestern Institute on Complex Systems (NICO)
12:00 PM
//
Lower Level, Chambers Hall
Details
Speaker:
Asma Ghandeharioun, Senior Research Scientist, People + AI Research Team, Google DeepMind
Title:
Model Interpretability: from Illusions to Opportunities
Abstract:
While the capabilities of today’s large language models (LLMs) are reaching—and even surpassing—what was once thought impossible, concerns remain regarding their misalignment, such as generating misinformation or harmful text, which continues to be an open area of research. Understanding LLMs’ internal representations can help explain their behavior, verify their alignment with human values, and mitigate instances where they produce errors. In this talk, I begin by challenging common misconceptions about the connections between LLMs' hidden representations and their downstream behavior, highlighting several “interpretability illusions.” For example, I demonstrate that, counterintuitively, localizing and editing facts within an LLM’s hidden representations can be disconnected; model failure and success in the wild cannot necessarily be predicted based on a relatively faithful proxy at training time; and even within the same architecture, representation similarity is not always indicative of prediction similarity.
Next, I introduce Patchscopes, a new framework that leverages the model itself to explain its internal representations in natural language. I’ll show how it can be used to answer a wide range of questions about an LLM's computation. I also demonstrate that many prior interpretability methods—based on projecting representations into the vocabulary space and intervening in LLM computation—can be viewed as instances of this framework. Furthermore, several of their shortcomings, such as difficulty inspecting early layers or lack of expressivity, can be mitigated by Patchscopes. Beyond unifying prior inspection techniques, Patchscopes opens up new possibilities, such as using a more capable model to explain the representations of a smaller model and multihop reasoning error correction.
Finally, I discuss a few failure cases in today’s most capable LLMs and show how Patchscopes can shed light on their mechanics and suggest mitigation strategies. For example, we observe that safety-tuned models may still divulge harmful information, and whether they do so often depends significantly on who they are talking to—what we refer to as the user persona. Using Patchscopes, we show that harmful content can persist in hidden representations and can be easily extracted. Additionally, we demonstrate that certain user personas can induce the model to form more charitable interpretations of otherwise dangerous queries
Speaker Bio:
Asma Ghandeharioun, Ph.D., is a senior research scientist with the People + AI Research team at Google DeepMind. She works on aligning AI with human values through better understanding [1] and controlling (language) models [2], uniquely by demystifying their inner workings [3] and correcting collective misconceptions along the way [4, 5]. While her current research is mostly focused on machine learning interpretability, her previous work spans conversational AI, affective computing, and, more broadly, human-centered AI. She holds a doctorate and master’s degree from MIT and a bachelor’s degree from the Sharif University of Technology. She has been trained as a computer scientist/engineer and has research experience at MIT, Google Research, Microsoft Research, Ecole Polytechnique Fédérale de Lausanne (EPFL), to name a few.
Her work has been published in premier peer-reviewed machine learning venues such as ICLR, NeurIPS, ICML, EMNLP, AAAI, ACII, and AISTATS. She has received awards at NeurIPS and her work has been featured in Wired, Wall Street Journal, and New Scientist.
Location:
In person: Chambers Hall, 600 Foster Street, Lower Level
Remote option: https://northwestern.zoom.us/j/91475935376
Passcode: NICO24
About the Speaker Series:
Wednesdays@NICO is a vibrant weekly seminar series focusing broadly on the topics of complex systems, data science and network science. It brings together attendees ranging from graduate students to senior faculty who span all of the schools across Northwestern, from applied math to sociology to biology and every discipline in-between. Please visit: https://bit.ly/WedatNICO for information on future speakers.
Time
Wednesday, October 9, 2024 at 12:00 PM - 1:00 PM
Location
Lower Level, Chambers Hall Map
Contact
Calendar
Northwestern Institute on Complex Systems (NICO)
No classes - Memorial Day - University offices are closed
University Academic Calendar
All Day
Details
No classes - Memorial Day - University offices are closed
Time
Monday, May 25, 2026
Contact
Calendar
University Academic Calendar
Data Science Nights - MAY 2026 - Speaker: Xudong Tang, Computer Science and NICO
Northwestern Institute on Complex Systems (NICO)
5:30 PM
//
M416, Technological Institute
Details
MAY MEETING: Thursday, May 28, 2026 at 5:30pm (US Central)
LOCATION:
ESAM Conference Room, Tech M416
2145 Sheridan Road, Evanston, IL 60208
AGENDA:
5:30pm - Meet and greet with refreshments
6:00pm - Talk with Xudong Tang, PhD Student, Computer Science, NICO, and the Human-AI Collaboration Lab, Northwestern University
TALK TITLE:
Human and Machine Perception of Voice Similarity
ABSTRACT:
Modern voice cloning systems generate synthetic speech that listeners frequently cannot identify as being synthetic. But a voice can sound natural without sounding like the intended person, and what determines whether a clone is heard as a particular person is an open question. Here we report a large-scale preregistered experiment in which we collected 92,239 responses from 175 participants on their perception of pairs of real recordings, voice clones, and continuously morphed voices drawn from 100 contemporary celebrities across 20 speaker groups. We find that voice clones do not reliably preserve perceived speaker identity, reducing same-speaker judgments by 12.7 percentage points even though the clones are produced by a state-of-the-art text-to-speech model, while leaving different-speaker judgments unchanged. Using continuously morphed stimuli, we find that speakers vary substantially in how much variation their perceived identity tolerates, and that this variation is not predicted by speaker demographics. Speaker embeddings account for 58.9\% (95\% CI = [55.7, 61.9]) of variance in identity judgments, which is more than acoustic features, social attributes, and clone status combined. Once all these observed features are accounted for, clone status adds no additional predictive power. These results shows that the perceptual impact of voice cloning is positional rather than categorical: we can model how listeners judge a voice by how close it falls to the perceptual boundary that defines each speaker's recognizable voice, applying the same criterion to real and synthetic speech alike.
DATA SCIENCE NIGHTS are monthly meetings featuring presentations and discussions about data-driven science and complex systems, organized by Northwestern University graduate students and scholars. Students and researchers of all levels are welcome! For more information: http://bit.ly/nico-dsn
FUTURE DATES:
Data Science Nights will return in September!
Time
Thursday, May 28, 2026 at 5:30 PM - 7:00 PM
Location
M416, Technological Institute Map
Contact
Calendar
Northwestern Institute on Complex Systems (NICO)
Spring 2026 Commencement
University Academic Calendar
All Day
Details
Spring 2026 Commencement
Time
Sunday, June 14, 2026
Contact
Calendar
University Academic Calendar
Juneteenth - University Closed
University Academic Calendar
All Day
Details
Juneteenth - University Closed
Time
Friday, June 19, 2026
Contact
Calendar
University Academic Calendar
Independence Day (observed) - University Closed
University Academic Calendar
All Day
Details
Independence Day (observed) - University Closed
Time
Friday, July 3, 2026
Contact
Calendar
University Academic Calendar
Fall 2026 Classes Begin
University Academic Calendar
All Day
Details
Fall 2026 Classes Begin
Time
Wednesday, September 23, 2026
Contact
Calendar
University Academic Calendar