check
Nonlinear Physics seminar: "Exploring the loss landscape of Neural Networks with Statistical Mechanics" | The Racah Institute of Physics

Nonlinear Physics seminar: "Exploring the loss landscape of Neural Networks with Statistical Mechanics"

Date: 
Wed, 07/02/202412:00-13:30
Location: 
Danciger B Building, Seminar room
Lecturer:  Prof. Yohai Bar-Sinai - TAU
 Abstract:
The training of neural networks is a complex, high-dimensional, non-convex and noisy optimization problem whose theoretical understanding is interesting both from an applicative perspective and for fundamental reasons. A core challenge is to understand the geometry and topography of the landscape that guides the optimization. I will present two projects in which we employ standard Statistical Mechanics methods to study this landscape. The talk will focus on using Langevin dynamics to study the landscape of an over-parameterized fully connected network performing a classification task. Analyzing the fluctuation statistics, in analogy to thermal dynamics at a constant temperature, we infer a clear geometric description of the low-loss region. We find that it is a low-dimensional manifold whose dimension can be readily obtained from the fluctuations. Furthermore, this dimension is controlled by the number of data points that reside near the classification decision boundary. Importantly, we find that a quadratic approximation of the loss near the minimum is fundamentally inadequate due to the exponential nature of the decision boundary and the flatness of the low-loss region. Time permitting, I will discuss a recent work in which we analyze “grokking”,  the intriguing phenomenon where a model learns to generalize long after it has fit the training data. We analyze the loss and accuracy structure of a linear teacher-student model and present exact predictions for the loss evolution and grokking time, using tools from Random Matrix Theory. We demonstrate that the sharp increase in generalization accuracy may not imply a transition from "memorization" to "understanding", but can simply be an artifact of the accuracy measure.