FOLDS seminar: An Information Geometric Understanding of Deep Learning
October 23, 2025 at 12:00 PM - 1:00 PM
Zoom link: https://upenn.zoom.us/j/98220304722
I will argue that properties of natural data are what predominantly
make deep networks so effective. To that end, I will show that deep
networks work well because of a characteristic structure in the space
of learnable tasks. The input correlation matrix for typical tasks has
a “sloppy” eigenspectrum where eigenvalues decay linearly on a
logarithmic scale. As a consequence, the Hessian and the Fisher
Information Matrix of a trained network also have a sloppy
eigenspectrum. Using this idea, I will demonstrate an analytical,
non-vacuous PAC-Bayes bound on the generalization error for general
deep networks.
I will show that the training process in deep learning explores a
remarkably low dimensional manifold, as low as three. Networks with a
wide variety of architectures, sizes, optimization and regularization
methods lie on the same manifold. Networks being trained on different
tasks (e.g., different subsets of ImageNet) using different methods
(e.g., supervised, transfer, meta, semi and self-supervised learning)
also lie on the same low-dimensional manifold.
I will show that typical tasks are highly redundant functions of their
inputs. Many perception tasks, from visual recognition, semantic
segmentation, optical flow, depth estimation, to vocalization
discrimination, can be predicted extremely well regardless of whether
data is projected in the principal subspace where it varies the most,
some intermediate subspace with moderate variability—or the bottom
subspace where data varies the least.
References
1. Does the data induce capacity control in deep learning? Rubing
Yang, Jialin Mao, and Pratik Chaudhari. [ICML ’22]
https://urldefense.com/v3/__ht
2. The Training Process of Many Deep Networks Explores the Same
Low-Dimensional Manifold. Jialin Mao, Itay Griniasty, Han Kheng Teoh,
Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik
Chaudhari. [PNAS 2024]. https://urldefense.com/v3/__ht
3. Many Perception Tasks are Highly Redundant Functions of their Input
Data. Rahul Ramesh, Anthony Bisulco, Ronald W. DiTullio, Linran Wei,
Vijay Balasubramanian, Kostas Daniilidis, Pratik Chaudhari.
(in submission) https://urldefense.com/v3/__ht
4. An Analytical Characterization of Sloppiness in Neural Networks:
Insights from Linear Models. Jialin Mao, Itay Griniasty, Yan Sun, Mark
K Transtrum, James P Sethna, Pratik Chaudhari.
(under review) https://urldefense.com/v3/__ht

