Science Cast

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Jack MillerOctober 27, 2023 9:20am

Views (113)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

arXivPDFOctober 26, 2023 12:00am

Authors

Jack Miller, Charles O'Neill, Thang Bui

Abstract

In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes.

TwitterandLinkedIn

0 comments

Add comment

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments