ASSET Seminar: “Provable vs Impossible Trust: Reasoning, Steering, and Safety”
/
Amy Gutmann Hall, Room 414
3333 Chestnut Street, Philadelphia, United States
Abstract: In this talk, I will discuss a collection of highlights from our recent work in trustworthy AI. (1) Certifying reasoning explanations with reliability guarantees and aligning with expert knowledge, (2) Simple yet effective steering inspired from theoretical rule-following mechanisms for transformers, and (3) The impossibility of monitoring stateless attackers and what safety defenses should […]

