Name: FOLDS seminar: An Information Geometric Understanding of Deep Learning
Start: 2025-10-23T12:00:00-04:00
End: 2025-10-23T13:00:00-04:00
Location: Amy Gutmann Hall, Room 414

FOLDS seminar: An Information Geometric Understanding of Deep Learning

October 23, 2025 at 12:00 PM - 1:00 PM

Share this event

Add to Calendar

Details

Date: October 23, 2025

Time: 12:00 PM - 1:00 PM

Event Category: SeminarColloquium

Event Tags:ASSET, CIS, ESE, IDEAS, STAT, PENN AI

Organizer

IDEAS Center

Website: View Organizer Website

Venue

Amy Gutmann Hall, Room 414 3333 Chestnut Street
Philadelphia
19104 Google Map

Zoom link: https://upenn.zoom.us/j/98220304722

I will argue that properties of natural data are what predominantly
make deep networks so effective. To that end, I will show that deep
networks work well because of a characteristic structure in the space
of learnable tasks. The input correlation matrix for typical tasks has
a “sloppy” eigenspectrum where eigenvalues decay linearly on a
logarithmic scale. As a consequence, the Hessian and the Fisher
Information Matrix of a trained network also have a sloppy
eigenspectrum. Using this idea, I will demonstrate an analytical,
non-vacuous PAC-Bayes bound on the generalization error for general
deep networks.

I will show that the training process in deep learning explores a
remarkably low dimensional manifold, as low as three. Networks with a
wide variety of architectures, sizes, optimization and regularization
methods lie on the same manifold. Networks being trained on different
tasks (e.g., different subsets of ImageNet) using different methods
(e.g., supervised, transfer, meta, semi and self-supervised learning)
also lie on the same low-dimensional manifold.

I will show that typical tasks are highly redundant functions of their
inputs. Many perception tasks, from visual recognition, semantic
segmentation, optical flow, depth estimation, to vocalization
discrimination, can be predicted extremely well regardless of whether
data is projected in the principal subspace where it varies the most,
some intermediate subspace with moderate variability—or the bottom
subspace where data varies the least.

References
1. Does the data induce capacity control in deep learning? Rubing
Yang, Jialin Mao, and Pratik Chaudhari. [ICML ’22]
https://urldefense.com/v3/__https://arxiv.org/abs/2110.14163__;!!IBzWLUs!Tq9FM96P-1mf3aRxklnZ7t8aLcjOIeWQz7icW_vh7HTMTDM2izgvjEC74IXkk0qZ7_TO9jbK-CF-J1f8wzalGOTVcA$
2. The Training Process of Many Deep Networks Explores the Same
Low-Dimensional Manifold. Jialin Mao, Itay Griniasty, Han Kheng Teoh,
Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik
Chaudhari. [PNAS 2024]. https://urldefense.com/v3/__https://arxiv.org/abs/2305.01604__;!!IBzWLUs!Tq9FM96P-1mf3aRxklnZ7t8aLcjOIeWQz7icW_vh7HTMTDM2izgvjEC74IXkk0qZ7_TO9jbK-CF-J1f8wzYgaqrIWg$
3. Many Perception Tasks are Highly Redundant Functions of their Input
Data. Rahul Ramesh, Anthony Bisulco, Ronald W. DiTullio, Linran Wei,
Vijay Balasubramanian, Kostas Daniilidis, Pratik Chaudhari.
(in submission) https://urldefense.com/v3/__https://arxiv.org/abs/2407.13841__;!!IBzWLUs!Tq9FM96P-1mf3aRxklnZ7t8aLcjOIeWQz7icW_vh7HTMTDM2izgvjEC74IXkk0qZ7_TO9jbK-CF-J1f8wzaKl_77LQ$
4. An Analytical Characterization of Sloppiness in Neural Networks:
Insights from Linear Models. Jialin Mao, Itay Griniasty, Yan Sun, Mark
K Transtrum, James P Sethna, Pratik Chaudhari.
(under review) https://urldefense.com/v3/__https://arxiv.org/abs/2505.08915__;!!IBzWLUs!Tq9FM96P-1mf3aRxklnZ7t8aLcjOIeWQz7icW_vh7HTMTDM2izgvjEC74IXkk0qZ7_TO9jbK-CF-J1f8wzYMqke2wg$

FOLDS seminar: An Information Geometric Understanding of Deep Learning

October 23, 2025 at 12:00 PM - 1:00 PM

Details

Organizer

Venue

Read More