ASSET Seminar: Machine Learning: A Data-Centric Perspective, Aleksander Madry (Massachusetts Institute of Technology)
April 12, 2023 at 12:00 PM - 1:30 PM
Organizer
ABSTRACT:
The training data that modern machine learning models ingest has a major impact on these models’ performance (as well as failures). Yet, this impact tends to be neither fully appreciated nor understood at a fine-grained enough level.
In this talk, we will discuss some of the key ways in which training data influences not only what but also how models “learn” as well as tools to dissect this influence. In particular, we will present a new framework—called datamodeling—for directly casting predictions as functions of training data and the corresponding model class. This framework enables us to perform a range of model class-driven data analysis, including discovery of subpopulations, quantifying brittleness of model predictions, and diagnosing other shortcomings of the training set.

