Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published 20 days ago • 35
Do LLM Agents Have Regret? A Case Study in Online Learning and Games Paper • 2403.16843 • Published Mar 25, 2024 • 2
Do LLM Agents Have Regret? A Case Study in Online Learning and Games Paper • 2403.16843 • Published Mar 25, 2024 • 2