Model Summary

s1.1 is our sucessor of s1 with better reasoning performance by leveraging reasoning traces from r1 instead of Gemini.

This model is a successor of s1-32B with slightly better performance. Thanks to Bespoke Labs (Ryan Marten) for helping generate r1 traces for s1K with Curator.

Use

The model usage is documented here.

Evaluation

Metric s1-32B s1.1-32B o1-preview o1 DeepSeek-R1 DeepSeek-R1-Distill-Qwen-32B
# examples 1K 1K ? ? >800K 800K
AIME2024 56.7 56.7 40.0 74.4 79.8 72.6
AIME2025 I 26.7 60.0 37.5 ? 65.0 46.1
MATH500 93.0 95.4 81.4 94.8 97.3 94.3
GPQA-Diamond 59.6 63.6 75.2 77.3 71.5 62.1

Note that s1-32B and s1.1-32B use budget forcing in this table; specifically ignoring end-of-thinking and appending "Wait" up to four times.

Downloads last month
15,612
Safetensors
Model size
32.8B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for simplescaling/s1.1-32B

Base model

Qwen/Qwen2.5-32B
Finetuned
(179)
this model
Finetunes
5 models
Merges
5 models
Quantizations
13 models

Dataset used to train simplescaling/s1.1-32B

Spaces using simplescaling/s1.1-32B 2