Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published 17 days ago • 90
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • Updated 15 days ago • 302k • 1.39k
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 391