Aioli: A Unified Optimization Framework for Language Model Data Mixing Paper • 2411.05735 • Published Nov 8, 2024 • 1 • 2
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published 12 days ago • 11 • 6
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions Paper • 2502.13124 • Published Feb 18 • 6 • 2
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction Paper • 2503.19658 • Published Mar 25 • 2 • 2