Progressive Sharpening is Caused by Layerwise Jacobian Alignment
Mark Lowell
Clarifai
Time: 2025-10-08, 12:00 - 13:00 ET
Location: Rice 540 and Zoom
Abstract During neural network training, the sharpness of the Hessian matrix of the training loss rises until training is on the edge of stability. As a result, even nonstochastic gradient descent does not accurately model the underlying dynamical system defined by the gradient flow of the training loss. We use an exponential Euler solver to train the network without entering the edge of stability, so that we accurately approximate the true gradient descent dynamics. We demonstrate experimentally that the increase in the sharpness of the Hessian matrix is caused by the layerwise Jacobian matrices of the network becoming aligned, so that a small change in the network preactivations near the inputs of the network can cause a large change in the outputs of the network. We further demonstrate that the degree of alignment scales with the size of the dataset by a power law with a coefficient of determination between 0.74 and 0.98.
Bio: Mark Lowell is a staff research scientist at Clarifai working on artificial intelligence. After earning a Ph.D. in mathematics from the University of Massachusetts Amherst, he worked at Booz Allen Hamilton and the National Geospatial-Intelligence Agency on computer vision for object detection in overhead imagery. His research interests include computer vision, deep learning theory, and ecological modeling.