ASSET Seminar: “Controlling Language Models”
March 26, 2025 at 12:00 PM - 1:15 PM
Details
Abstract:
Controlling language models is key to unlocking their full potential and making them useful for downstream tasks. Successfully deploying these models often requires both task-specific customization and rigorous auditing of their behavior. In this talk, I will begin by introducing a customization method called Prefix-Tuning, which adapts language models by updating only 0.1% of their parameters. Next, I will address the need for robust auditing by presenting a Frank-Wolfe-inspired algorithm for red-teaming language models, which provides a principled framework for discovering diverse failure modes. Finally, I will rethink the root cause of these control challenges, and propose a new generative model for text, called Diffusion-LM, which is controllable by design.
Zoom Link (if unable to attend in-person): https://upenn.zoom.us/j/93867005722

