view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate Jun 13, 2024 • 54
Leanabell-Prover: Posttraining Scaling in Formal Reasoning Paper • 2504.06122 • Published 13 days ago • 6
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 13 days ago • 163
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published 20 days ago • 61
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published 25 days ago • 43
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Paper • 2503.18892 • Published 27 days ago • 30
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published Mar 20 • 46
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10 • 41
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 119
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 21 items • Updated 5 days ago • 126