Loading Events

ASSET Seminar: “Towards Pluralistic Alignment: Foundations for Learning from Diverse Human Preferences”

October 2, 2024 at 12:00 PM - 1:15 PM
Details
Date: October 2, 2024
Time: 12:00 PM - 1:15 PM
  • Event Tags:, ,
  • Venue
    Raisler Lounge (Room 225), Towne Building 220 South 33rd Street
    Philadelphia
    PA 19104
    Google Map

    Abstract:

    Large pre-trained models trained on internet-scale data are often not ready for safe deployment out-of-the-box. They are heavily fine-tuned and aligned using large quantities of human preference data, usually elicited using pairwise comparisons. While aligning an AI/ML model to human preferences or values, it is important to ask whose preference and values we are aligning it to? The current approaches of alignment are severely limited due to their inherent uniformity assumption. While there is rich literature on learning preferences from human judgements using comparison queries, the models often focus on learning average preference over the population due to the limitations on the amount of data available per individual or on learning an individual’s preference using a lot of queries.  Furthermore, the knowledge of the metric, i.e., the way humans judge similarity and dissimilarity, is assumed to be known which does not hold in practice. We aim to overcome these limitations by building mathematical foundations for learning from diverse human preferences.

    In this talk, I will present, PAL, a personalize-able reward modelling framework for pluralistic alignment, which captures diversity in preferences while also capturing commonalities that can be learned by pooling together data from individuals. I will also discuss some recent theoretical results on per user sample complexity for generalization and fundamental limitations when there are limited pairwise comparisons.

    Based on work with Daiwei Chen, Yi Chen, Aniket Rege, Zhi Wang, Geelon So, Greg Canal, Blake Mason, Gokcan Tatli, and Rob Nowak. References:

    1. PAL: Pluralistic Alignment Framework for learning from heterogeneous preferences  (preprint, 2024)
    2. One-for-all: Simultaneous metric and preference learning (appeared in Neurips 2022)
    3. Metric learning via limited pairwise comparisons (appeared in UAI 2024), and
    4. Learning Populations of Preferences via pairwise comparisons (appeared in AISTATS 2024).

    Zoom Link (if unable to attend in-person): https://upenn.zoom.us/j/95536358996