flowers-team/StickToYourRoleLeaderboard · Plans to include additional models?

13 days ago

Hello, I'm just inquiring as to whether there's any plans to further update the this benchmark/leaderboard with additional models. Would there be any way for us to request models to be tested/benchmarked?

grg

Flowers Team Inria org 13 days ago

Hello! I'm doing my best to maintain the leaderboard with the time I have between other projects. 🙂
Absolutely — feel free to suggest models! Ideally, they should be runnable with vLLM and have a context length of at least ~8k tokens. You’re welcome to post suggestions here or open a new issue.

SamuraiBarbi

3 days ago

•

edited 3 days ago

Would we be able to test the following models?

https://huggingface.co/shuttleai/shuttle-3.5

https://huggingface.co/THUDM/GLM-4-32B-0414

https://huggingface.co/Qwen/Qwen3-235B-A22B

https://huggingface.co/Qwen/Qwen3-30B-A3B

https://huggingface.co/Qwen/Qwen3-32B

https://huggingface.co/Qwen/Qwen3-8B

https://huggingface.co/Qwen/Qwen3-4B

These are more recent models that have dropped where I've seen creative writing benchmarking/evaluation but none really on role play.

Edit: Added Qwen3-235B-A22B to the list