arxiv:2504.20752

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Published on Apr 29

· Submitted by

fsteinbauer on May 6

#1 Paper of the day

Upvote

Authors:

Roman Abramov ,

Felix Steinbauer ,

Abstract

Transformers have achieved great success in numerous NLP tasks but continue to exhibit notable gaps in multi-step factual reasoning, especially when real-world knowledge is sparse. Recent advances in grokking have demonstrated that neural networks can transition from memorizing to perfectly generalizing once they detect underlying logical patterns - yet these studies have primarily used small, synthetic tasks. In this paper, for the first time, we extend grokking to real-world factual data and address the challenge of dataset sparsity by augmenting existing knowledge graphs with carefully designed synthetic data to raise the ratio phi_r of inferred facts to atomic facts above the threshold required for grokking. Surprisingly, we find that even factually incorrect synthetic data can strengthen emergent reasoning circuits rather than degrade accuracy, as it forces the model to rely on relational structure rather than memorization. When evaluated on multi-hop reasoning benchmarks, our approach achieves up to 95-100% accuracy on 2WikiMultiHopQA - substantially improving over strong baselines and matching or exceeding current state-of-the-art results. We further provide an in-depth analysis of how increasing phi_r drives the formation of generalizing circuits inside Transformers. Our findings suggest that grokking-based data augmentation can unlock implicit multi-hop reasoning capabilities, opening the door to more robust and interpretable factual reasoning in large-scale language models.

View arXiv page View PDF Add to collection

Community

fsteinbauer

Paper author Paper submitter 10 days ago

Hey, we made grokking work for transformers and real-world data.

Of course, there are some caveats, and it's not straightforward to apply. But we think the results so far are actually quite promising! For more details, check out our paper here.

bardhprenkaj

10 days ago

Can you put the code referece on GitHub. I'm interested in checking whether grokking GPT-2 into reasoning for multiple-hops is viable. Do you think it is? If so, what would you suggest to do?

fsteinbauer

Paper author 10 days ago

I think it is definitely possible but we haven't tried. I assume it might take much longer and/or needs more inferred facts (i.e., higher ratio phi_r) as the generalization circuit that needs to form within the layers is more complex and deep. Actually, we are currently working on an approach to enable grokking for more complex (and deeper) reasoning subcircuits to tackle more unstructured / messy datasets.

Regarding Code: The training data is not the only thing that is severely unstructured 😅
@monsetrum If you have the time, it would really be nice to clean the code and put it online as people seem to be interested (also on Papers With Code if we are on it).

yjh415

4 days ago

an audio overview: https://youtu.be/ov0Pxy8otjk

fsteinbauer

Paper author 3 days ago

Not certain whether I should report you or thank you for what you did with our paper there. 😅
Imo, the text-to-speech is technically nicely executed, though! (The "OD" is pronounced weirdly...)