Chat completion issues
I'm trying to get this model to work on chat completion and it will not stop talking during the thinking phase. Just goes on indefinitely.
I've tried replacing the tokenizer_config.json with one from the regular QwQ model with no success.
Should I just assume the model is cooked for chat completion? I'd really like to update my exl2 quants of the model with a working config.
Thank you!
Hey @FrenzyBiscuit !
We did a couple of tests on unquanted and GGUF'd Snowdrop and we can't replicate this via chat completion - we're not able to reproduce it going on indefinitely, thinking or otherwise.
Mind sending over a preset where you saw this happening? Would love to keep trying to replicate it
Sure, on openwebui everything is set to "default" and I am using no system prompt. Here is the quant I am using:
https://huggingface.co/ReadyArt/QwQ-32B-Snowdrop-v0_EXL2_8.0bpw_H8
This is what it spits out (and the page keeps going down). I guess it's possible its the lack of system prompt and/or a broken quant, though.
I can try the recommended settings listed on the main page, but usually when I get the assistant and user messages it means the template is busted.
I got openwebui installed now, will try soon and report back. We did our test in other frontends except this one.
Great, thanks!
Quick note. I'm not too familiar with GGUF since I don't use that quant type, but my understanding is GGUF quants use some internal template for chat completion. You likely would not be able to replicate the issue with GGUF.
EXL2 uses the tokenizer_config.json with the chat_template directly.