Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published 16 days ago • 11
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published 9 days ago • 83
Heimdall: test-time scaling on the generative verification Paper • 2504.10337 • Published 9 days ago • 32
Temporal Consistency for LLM Reasoning Process Error Identification Paper • 2503.14495 • Published Mar 18 • 9
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding Paper • 2502.05431 • Published Feb 8 • 6
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published Feb 10 • 21
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 151
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 61
Scaling Flaws of Verifier-Guided Search in Mathematical Reasoning Paper • 2502.00271 • Published Feb 1 • 1
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 385
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 286
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21, 2024 • 62