ASSET Seminar: “How do LLMs generalize on out-of-distribution tasks? insights from model’s internal representations”
September 24, 2025 at 12:00 PM - 1:15 PM
Organizer
A mystery of large language models (LLMs) is their ability to solve novel tasks, notably through a few demonstrations in the prompt (in-context learning). Such tasks often require the model to generalize far beyond its training distribution, raising the question: how do LLMs achieve this form of out-of-distribution (OOD) generalization? For example, in symbolized language reasoning where names/labels are replaced by arbitrary symbols, yet the model can infer the correct name-label mapping without any finetuning.
In this talk, I will open the black box of LLMs and reveal how three facets of LLM behavior are interconnected: emergent phenomena during training, OOD generalization, and a model’s representation of compositions. Focusing on induction heads, I will show that learning the right compositional structure is a key to OOD generalization, and this learning process exhibits sharp transitions in training dynamics. Further, I propose that “”common bridge representation hypothesis””—where a latent subspace in the embedding space acts as a bridge to align multiple attention heads across early and later layers—may be the key geometric structure underlying the success of transformers.

