Score: 0

A Two-Phase Perspective on Deep Learning Dynamics

Published: April 17, 2025 | arXiv ID: 2504.12700v1

By: Robert de Mello Koch, Animik Ghosh

Potential Business Impact:

Helps computers learn better by forgetting some things.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We propose that learning in deep neural networks proceeds in two phases: a rapid curve fitting phase followed by a slower compression or coarse graining phase. This view is supported by the shared temporal structure of three phenomena: grokking, double descent and the information bottleneck, all of which exhibit a delayed onset of generalization well after training error reaches zero. We empirically show that the associated timescales align in two rather different settings. Mutual information between hidden layers and input data emerges as a natural progress measure, complementing circuit-based metrics such as local complexity and the linear mapping number. We argue that the second phase is not actively optimized by standard training algorithms and may be unnecessarily prolonged. Drawing on an analogy with the renormalization group, we suggest that this compression phase reflects a principled form of forgetting, critical for generalization.

Explaining Grokking and Information Bottleneck through Neural Collapse Emergence

Machine Learning (CS)

Teaches computers to learn better and faster.

25 Sep 2025 1

89%

A dynamic view of some anomalous phenomena in SGD

Optimization and Control

Helps computers learn better by finding hidden patterns.

3 May 2025 0

88%

Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks

Machine Learning (CS)

Helps computers learn better by avoiding bad learning habits.

10 May 2025 0

View PDF Login to Bookmark

Page Count

17 pages

A Two-Phase Perspective on Deep Learning Dynamics

Helps computers learn better by forgetting some things.

Technical Abstract

Explaining Grokking and Information Bottleneck through Neural Collapse Emergence

A dynamic view of some anomalous phenomena in SGD

Phase Transitions between Accuracy Regimes in L2 regularized Deep Neural Networks