Chimera not separating reasoning from response

by SanityInMotion - opened 5 days ago

5 days ago

•

It seems as though Chimera has stopped wrapping its reasoning process in think tags on both ends, making it harder to separate its reasoning from its actual answer - unlike Deepseek R1 for example. This manifests as both Chutes and Openrouter not separating the reasoning from the response in the API response and not separating out reasoning in their builtin chat functionality. Is this intentional?

cinnamoo0

5 days ago

The model doesn't even reason for me anymore haha.

TNGHK

TNG Technology Consulting GmbH org 5 days ago

Hello there,

thank you for pointing this out. Maybe OpenRouter changed their chat template? If you look at the statistics, the Chimera usually had a relation of 1:1 to 3:1 between completion and reasoning tokens.

Since May 13th, this has changed significantly. Now, the completion-reasoning-relation has changed to something like 50:1, which suggests that reasoning has become very rare in comparison.

Maybe ask OpenRouter?

PS: We did not change the model.

TNGHK

TNG Technology Consulting GmbH org 5 days ago

•

edited 4 days ago

Update: We wrote this to OpenRouter on X.

"To us it seems that you no longer use the chat-template provided with in the tokenizer_config.json file."
"We followed the suggested way from DeepSeek for using the original R1 version: prefix the Assistant message with "think" (in angle brackets) to ensure reasoning. This got added to the tokenizer, as can be seen by this change from DeepSeek in HF: https://huggingface.co/deepseek-ai/DeepSeek-R1/commit/8a58a132790c9935686eb97f042afa8013451c9f
We provide the exact same tokenizer_config.json file as DeepSeek does. [...]

SanityInMotion

2 days ago

Have you got any updates on this? The problem still appears to be happening. I reached out on the openrouter discord server but have not heard anything back from them.

TNGHK

TNG Technology Consulting GmbH org 2 days ago

I just got an answer from the person at chutes who is in charge of the model on their side and I think they fixed it (thanks, JD!). I also just checked using the OpenRouter chat and asked simple questions, e.g.

"Can you give me the sum of the squares of the integers from 1 to 10, please?"

The Chimera started thinking properly:

"Okay, so I need to find the sum of the squares of the integers from 1 to 10. Let me think about how to approach this.

First, I recall that the squares of integers from 1 to 10 are each number multiplied by itself. So, I can list them out:

1² = 1
2² = 4
..."

Can you try again on your side?

TNGHK

TNG Technology Consulting GmbH org 2 days ago

•

edited 2 days ago

Probably it was SGLang 0.4.6.post4, which was released on May 13th, on the day the reasoning change appeared in the OpenRouter statistics.

TNGHK

TNG Technology Consulting GmbH org 2 days ago

•

edited 2 days ago

They upgraded from SGLang 0.4.5 on May 13th. This seems to be the cause.

We're thinking about it if we can recreate it on our side or otherwise advise.

jondurbin

2 days ago

•

edited 2 days ago

The chat template defined in this repo includes the opening think tag, meaning that since it is now part of the chat template itself the model will likely never emit one in it's output.

Therefore, all tokens before the closing tag are reasoning tokens.

To revert this, simply remove the opening think tag as part of the chat template and let the model generate it.

SGLang update was on the 14th, but does not change chat template to my knowledge.

SanityInMotion

2 days ago

•

edited 2 days ago

I just got an answer from the person at chutes who is in charge of the model on their side and I think they fixed it (thanks, JD!). I also just checked using the OpenRouter chat and asked simple questions, e.g.

"Can you give me the sum of the squares of the integers from 1 to 10, please?"

The Chimera started thinking properly:

"Okay, so I need to find the sum of the squares of the integers from 1 to 10. Let me think about how to approach this.

First, I recall that the squares of integers from 1 to 10 are each number multiplied by itself. So, I can list them out:

1² = 1
2² = 4
..."

Can you try again on your side?

The issue described has not been resolved. Using your example, the reasoning is still being written with the output - compare the output in OpenRouter Chat to something like Deepseek:R1 and you can clearly see that R1 is separating out reasoning while Chimera is not. This also happens in Chutes Chat and in Chutes Playground.

wsbagnsv1

1 day ago

Same, it still doesnt do reasoning tokens correctly /:

SanityInMotion

about 3 hours ago

Is there any update on this? OpenRouter and Chutes both continue not to separate out reasoning from response.

TNGHK

TNG Technology Consulting GmbH org about 2 hours ago

Hello,

thanks for asking. We also did try some TNG-local adaptions to deal with it, but that is not finished yet. Do you have some example prompts, preferably of a type that is relevant to you, that show the undesired behaviour? We can then test if the newest local version can deal with those prompts correctly right away.

You can also email us these or send them via LinkedIn etc.

Cheers,
Henrik

SanityInMotion

about 2 hours ago

Hello,

The example prompt used earlier works just fine "Can you give me the sum of the squares of the integers from 1 to 10, please?" - although any prompt that triggers Chimeras reasoning works to illustrate the issue. They could be Maths questions, questions about characters in popular media, interesting facts about the world. Things like that.

jondurbin

about 1 hour ago

Is there any update on this? OpenRouter and Chutes both continue not to separate out reasoning from response.

https://huggingface.co/tngtech/DeepSeek-R1T-Chimera/discussions/3#682cab7ac2f5e0c9b99bb2cb

As mentioned, the opening think tag is now baked in to the chat template so the model never produces one, it's inherent.

The maintainers of this model can remove the forced think tag as part of the chat template and it will work as before.

We can try to update our front end to display it nicely given this change but for example we can't/won't change the API. If this is the chat template that the model maintainer wants to use we won't override it.

Whoever maintains the chat template determines whether or not a think tag is produced as part of model output or the template itself (and therefore not included in output).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment