FOLDS SEMINAR: The Hidden Width of Deep ResNets
November 10, 2025 at 2:30 PM - 3:30 PM
Share this event
Zoom link: https://upenn.zoom.us/j/6130182858
We present a mathematical framework to analyze the training dynamics of deep ResNets that rigorously captures practical architectures (including Transformers) trained from standard random initializations. Our approach combines stochastic approximation of ODEs with propagation-of-chaos arguments to obtain tight convergence rates to the “infinite size” limit of the dynamics. It yields the following insights:
1/ Depth begets width: infinite-depth ResNets of any hidden width behave throughout training as if they were infinitely wide;
2/ Phase diagram: we derive the phase diagram of the training dynamics, which singles out an “ideal” scaling of hyper-parameters (initialization scale and learning-rates), extending “CompleteP” to more general architectures;
3/ Optimal shape scaling: our analysis suggests how to scale depth, hidden width and embedding dimension of a ResNet when scaling up parameter count. With the optimal shape and a parameter budget P, we argue that the model converges to its limiting dynamics at rate P^{-1/6}.

