Name: ASSET Seminar: “Getting Lost in ML Safety Vibes”
Start: 2025-04-02T12:00:00-04:00
End: 2025-04-02T13:15:00-04:00
Location: Amy Gutmann Hall, Room 414

ASSET Seminar: “Getting Lost in ML Safety Vibes”

April 2, 2025 at 12:00 PM - 1:15 PM

Share this event

Add to Calendar

Details

Date: April 2, 2025

Time: 12:00 PM - 1:15 PM

Event Tags:ASSET, CIS, AI

Venue

Amy Gutmann Hall, Room 414 3333 Chestnut Street
Philadelphia
19104 Google Map

Abstract:

Machine learning applications are increasingly reliant on black-box pretrained models. To ensure safe use of these models, techniques such as unlearning, guardrails, and watermarking have been proposed to curb model behavior and audit usage. Unfortunately, while these post-hoc approaches give positive safety ‘vibes’ when evaluated in isolation, our work shows that existing techniques are quite brittle when deployed as part of larger systems. In a series of recent works, we show that: (a) small amounts of auxiliary data can be used to ‘jog’ the memory of unlearned models; (b) current unlearning benchmarks obscure deficiencies in both finetuning and guardrail-based approaches; and (c) simple, scalable attacks erode existing LLM watermarking systems and reveal fundamental trade-offs in watermark design. Taken together, these results highlight major deficiencies in the practical use of post-hoc ML safety methods. We end by discussing promising alternatives to ML safety, which instead aim to ensure safety by design during the development of ML systems.

Zoom Link (if unable to attend in-person): https://upenn.zoom.us/j/91619533220

ASSET Seminar: “Getting Lost in ML Safety Vibes”

April 2, 2025 at 12:00 PM - 1:15 PM

Details

Venue

Read More