PaperBench: Evaluating AI's Ability to Replicate AI Research Paper • 2504.01848 • Published 25 days ago • 36
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering Paper • 2410.07095 • Published Oct 9, 2024 • 6