Loading Events

FOLDS seminar: An Information Geometric Understanding of Deep Learning

October 23, 2025 at 12:00 PM - 1:00 PM
Details
Date: October 23, 2025
Time: 12:00 PM - 1:00 PM
Event Category: SeminarColloquium
  • Event Tags:, , , , ,
  • Organizer
    IDEAS Center
    Venue
    Amy Gutmann Hall, Room 414 3333 Chestnut Street
    Philadelphia
    19104
    Google Map

    Zoom linkhttps://upenn.zoom.us/j/98220304722

     

    I will argue that properties of natural data are what predominantly
    make deep networks so effective. To that end, I will show that deep
    networks work well because of a characteristic structure in the space
    of learnable tasks. The input correlation matrix for typical tasks has
    a “sloppy” eigenspectrum where eigenvalues decay linearly on a
    logarithmic scale. As a consequence, the Hessian and the Fisher
    Information Matrix of a trained network also have a sloppy
    eigenspectrum. Using this idea, I will demonstrate an analytical,
    non-vacuous PAC-Bayes bound on the generalization error for general
    deep networks.

    I will show that the training process in deep learning explores a
    remarkably low dimensional manifold, as low as three. Networks with a
    wide variety of architectures, sizes, optimization and regularization
    methods lie on the same manifold. Networks being trained on different
    tasks (e.g., different subsets of ImageNet) using different methods
    (e.g., supervised, transfer, meta, semi and self-supervised learning)
    also lie on the same low-dimensional manifold.

    I will show that typical tasks are highly redundant functions of their
    inputs. Many perception tasks, from visual recognition, semantic
    segmentation, optical flow, depth estimation, to vocalization
    discrimination, can be predicted extremely well regardless of whether
    data is projected in the principal subspace where it varies the most,
    some intermediate subspace with moderate variability—or the bottom
    subspace where data varies the least.

    References
    1. Does the data induce capacity control in deep learning? Rubing
    Yang, Jialin Mao, and Pratik Chaudhari. [ICML ’22]
    https://urldefense.com/v3/__https://arxiv.org/abs/2110.14163__;!!IBzWLUs!Tq9FM96P-1mf3aRxklnZ7t8aLcjOIeWQz7icW_vh7HTMTDM2izgvjEC74IXkk0qZ7_TO9jbK-CF-J1f8wzalGOTVcA$
    2. The Training Process of Many Deep Networks Explores the Same
    Low-Dimensional Manifold. Jialin Mao, Itay Griniasty, Han Kheng Teoh,
    Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik
    Chaudhari. [PNAS 2024]. https://urldefense.com/v3/__https://arxiv.org/abs/2305.01604__;!!IBzWLUs!Tq9FM96P-1mf3aRxklnZ7t8aLcjOIeWQz7icW_vh7HTMTDM2izgvjEC74IXkk0qZ7_TO9jbK-CF-J1f8wzYgaqrIWg$
    3. Many Perception Tasks are Highly Redundant Functions of their Input
    Data. Rahul Ramesh, Anthony Bisulco, Ronald W. DiTullio, Linran Wei,
    Vijay Balasubramanian, Kostas Daniilidis, Pratik Chaudhari.
    (in submission) https://urldefense.com/v3/__https://arxiv.org/abs/2407.13841__;!!IBzWLUs!Tq9FM96P-1mf3aRxklnZ7t8aLcjOIeWQz7icW_vh7HTMTDM2izgvjEC74IXkk0qZ7_TO9jbK-CF-J1f8wzaKl_77LQ$
    4. An Analytical Characterization of Sloppiness in Neural Networks:
    Insights from Linear Models. Jialin Mao, Itay Griniasty, Yan Sun, Mark
    K Transtrum, James P Sethna, Pratik Chaudhari.
    (under review) https://urldefense.com/v3/__https://arxiv.org/abs/2505.08915__;!!IBzWLUs!Tq9FM96P-1mf3aRxklnZ7t8aLcjOIeWQz7icW_vh7HTMTDM2izgvjEC74IXkk0qZ7_TO9jbK-CF-J1f8wzYMqke2wg$