Name: ASSET Seminar: “Towards Pluralistic Alignment: Foundations for Learning from Diverse Human Preferences”
Start: 2024-10-02T12:00:00-04:00
End: 2024-10-02T13:15:00-04:00
Location: Raisler Lounge (Room 225), Towne Building

ASSET Seminar: “Towards Pluralistic Alignment: Foundations for Learning from Diverse Human Preferences”

October 2, 2024 at 12:00 PM - 1:15 PM

Share this event

Add to Calendar

Details

Date: October 2, 2024

Time: 12:00 PM - 1:15 PM

Event Tags:ASSET, CIS, AI

Venue

Raisler Lounge (Room 225), Towne Building 220 South 33rd Street
Philadelphia
PA 19104 Google Map

View Venue Website

Abstract:

Large pre-trained models trained on internet-scale data are often not ready for safe deployment out-of-the-box. They are heavily fine-tuned and aligned using large quantities of human preference data, usually elicited using pairwise comparisons. While aligning an AI/ML model to human preferences or values, it is important to ask whose preference and values we are aligning it to? The current approaches of alignment are severely limited due to their inherent uniformity assumption. While there is rich literature on learning preferences from human judgements using comparison queries, the models often focus on learning average preference over the population due to the limitations on the amount of data available per individual or on learning an individual’s preference using a lot of queries. Furthermore, the knowledge of the metric, i.e., the way humans judge similarity and dissimilarity, is assumed to be known which does not hold in practice. We aim to overcome these limitations by building mathematical foundations for learning from diverse human preferences.

In this talk, I will present, PAL, a personalize-able reward modelling framework for pluralistic alignment, which captures diversity in preferences while also capturing commonalities that can be learned by pooling together data from individuals. I will also discuss some recent theoretical results on per user sample complexity for generalization and fundamental limitations when there are limited pairwise comparisons.

Based on work with Daiwei Chen, Yi Chen, Aniket Rege, Zhi Wang, Geelon So, Greg Canal, Blake Mason, Gokcan Tatli, and Rob Nowak. References:

PAL: Pluralistic Alignment Framework for learning from heterogeneous preferences (preprint, 2024)
One-for-all: Simultaneous metric and preference learning (appeared in Neurips 2022)
Metric learning via limited pairwise comparisons (appeared in UAI 2024), and
Learning Populations of Preferences via pairwise comparisons (appeared in AISTATS 2024).

Zoom Link (if unable to attend in-person): https://upenn.zoom.us/j/95536358996

ASSET Seminar: “Towards Pluralistic Alignment: Foundations for Learning from Diverse Human Preferences”

October 2, 2024 at 12:00 PM - 1:15 PM

Details

Venue

Read More