SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Paper • 2503.15478 • Published Mar 19 • 10