Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published 18 days ago • 11
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published 11 days ago • 84
Heimdall: test-time scaling on the generative verification Paper • 2504.10337 • Published 11 days ago • 32
Temporal Consistency for LLM Reasoning Process Error Identification Paper • 2503.14495 • Published Mar 18 • 9