ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13 • 44
Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games Paper • 2412.13602 • Published Dec 18, 2024 • 1
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13 • 44
Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games Paper • 2412.13602 • Published Dec 18, 2024 • 1