Physics Colloquium : "Criticality, universality and Grokking in Neural Networks"

Date: 
Mon, 09/12/202412:00-13:30
Location: 
Place: Levin building, Lecture Hall No. 8
Lecturer:  Yohai Bar-Sinai, Tel Aviv University

Abstract:
The empirical success of neural models far surpasses their theoretical understanding, and explaining their inner workings is important both as a fundamental scientific question and for practical reasons. In this talk I’ll review some “big questions” in the field, and the prospects of applying physics-inspired methodologies to tackle them. We will focus on an intriguing phenomenon named Grokking, where a neural model learns to generalize long after it has completely fit the training data. This transition often occurs abruptly and has been observed across a range of synthetic and realistic scenarios. By investigating Grokking in simplified models—such as linear or near-linear systems—we leverage tools from statistical mechanics, random matrix theory, and optimization to gain insights. We claim that Grokking is a near-critical phenomenon, appearing near singularities in the "thermodynamic limit" (in some proper sense) of training dynamics. Specifically, delayed generalization stems from “critical slowing down”, a generic feature of systems near phase transitions, that induces slow dynamical time scales near almost-stable fixed points. This perspective enables us to derive concrete scaling predictions that are applicable even in more complex settings, owing to the universal nature of critical phenomena.