Loading Events

ASSET Seminar: “Provable vs Impossible Trust: Reasoning, Steering, and Safety”

September 3, 2025 at 12:00 PM - 1:15 PM
Details
Date: September 3, 2025
Time: 12:00 PM - 1:15 PM
Event Category: Seminar
  • Event Tags:, , ,
  • Organizer
    AI-enabled Systems: Safe, Explainable, and Trustworthy (ASSET) Center
    Venue
    Amy Gutmann Hall, Room 414 3333 Chestnut Street
    Philadelphia
    19104
    Google Map
    Abstract: In this talk, I will discuss a collection of highlights from our recent work in trustworthy AI.
    (1) Certifying reasoning explanations with reliability guarantees and aligning with expert knowledge,
    (2) Simple yet effective steering inspired from theoretical rule-following mechanisms for transformers, and
    (3) The impossibility of monitoring stateless attackers and what safety defenses should be doing.

     

    Seminar Recording: https://drive.google.com/file/d/1FNeVVPXb_vZiNWVexFTgTFoVKBM_QnqQ/view?usp=sharing