Chat completion issues

#5
by FrenzyBiscuit - opened

I'm trying to get this model to work on chat completion and it will not stop talking during the thinking phase. Just goes on indefinitely.

I've tried replacing the tokenizer_config.json with one from the regular QwQ model with no success.

Should I just assume the model is cooked for chat completion? I'd really like to update my exl2 quants of the model with a working config.

Thank you!

trashpanda org

Hey @FrenzyBiscuit !

We did a couple of tests on unquanted and GGUF'd Snowdrop and we can't replicate this via chat completion - we're not able to reproduce it going on indefinitely, thinking or otherwise.

Mind sending over a preset where you saw this happening? Would love to keep trying to replicate it

Sure, on openwebui everything is set to "default" and I am using no system prompt. Here is the quant I am using:

https://huggingface.co/ReadyArt/QwQ-32B-Snowdrop-v0_EXL2_8.0bpw_H8

This is what it spits out (and the page keeps going down). I guess it's possible its the lack of system prompt and/or a broken quant, though.

firefox_FgDNfNJIOJ.png

image.png

I can try the recommended settings listed on the main page, but usually when I get the assistant and user messages it means the template is busted.

trashpanda org

I got openwebui installed now, will try soon and report back. We did our test in other frontends except this one.

Great, thanks!

Quick note. I'm not too familiar with GGUF since I don't use that quant type, but my understanding is GGUF quants use some internal template for chat completion. You likely would not be able to replicate the issue with GGUF.

EXL2 uses the tokenizer_config.json with the chat_template directly.

GGUF quants just embed the jinja template from the original tokenizer

Screenshot_20250404-183244.png

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment