feedback :)

#1
by mradermacher - opened

Can't offer much on feedback, but I have replaced unleashed by v2 when it came out. My very subjective experience with my likely weird setup is that its writing is still good, maybe better. However, I found myself regenerating much more often because it didn't follow the prompt correctly (that could mean the first model is better because I rarely retry generation there and might simply be content with more mediocre output).

It also has the annoying tendency to only implement half of the prompt. (e.g. 'go there and do that' => very good narration of going there, then stops), so I also hit continue very often, which usually works in coaxing more out of it, but not always - I find myself prompting "[continue]" quite often, which works.

These are very superficial impressions only, of course.

I plan going back to the first model soon for a while to compare better.

Overall, as with the previous unleashed model, it ranges at the very top for me.

Thanks for the feedback! With the instruction following, would you say it has a tendency to try and force short replies, or that the responses are long enough but overly focused on one part of the prompt?

I've noticed short replies myself so that's something I'm in the process of curating more data for, along with more varied system prompt examples where special formatting is required. Part of the dataset heavily focused on making characters essentially try to "do what they want (as long as it's in character)" rather than what they think the user wants, so that might be conflicting with instruction following too.

force short replies, or overly focused on one part of the prompt?

That is very difficult to say. It feels like being overly focused on one part, but on the other hand, it's always the first part that gets the attention. But it doesn't feel as if it's just wants short replies, because it can do long replies quite often. So, yeah, I it might be that it is focused on one part sometimes. Really, it feels as if it simply forgot.

It could be a conflict of sorts, but then, there are plenty of characters who can react and add something. A character "not wanting to do that" could explain it though - it would ignore the part it doesn't like, although I rarely strongly force the characters, in fact, I enjoy when the model does things I didn't anticipate, and I very often follow the model.

do what they want (as long as it's in character)

Oh, wow, that is definitely what I want :) My system prompt has this in it (in lots of variations over time): "When something happens, think about what characters in the same location would say or do and narrate that. Characters must act like real persons would act in the same circumstances, according to their gender, experiences and history. You can deviate from the instructions to make the story better, but you should not contradict it."

That helps with other models, not sure if this model needs it.

However, I think my set-up is very inefficient, instructions essentially duplicate the story. But none of the front-ends do what I really want. And writing my own front-end is somewhere deep on my list of things to do, so the model not exactly performing perfectly probably is a reflection of my bad setup, rather than the model.

Overall, GeneticLemonade Unleashed is my favourite model for quite some while now, mostly because it actually manages to follow my prompts to good capacity. I had to reduce my system prompt quite a bit because it suddenly started to actually follow all the garbage that was in there (same with L3.1-nemotron-sunfall).

On further testing I think it does have some coherency issues that's impacting it's overall consistency / instruction following abilities like you noticed.

I've expanded the dataset with more RP and some instruct data which seems to have helped it out as this new version (https://huggingface.co/zerofata/L3.3-GeneticLemonade-Unleashed-v2.1-70B) appears more consistent so far across my initial testing characters.

oh, wow, can't wait to try it :)

Seems you have absolutely delivered!

Haven't had much time to use it, but so far, 2.1 is way better, almost no regens needed. Not sure if it's otherwise any good, but other differences between it and v1 seem to be within noise levels.

Definitely my work model for some time to come.

Owner

Good to hear! Thanks again for the quants as always too.

I've had some further feedback that it may still be a bit unwieldy with instructions, although I wasn't able to recreate the issues on this new version. If you do spot anything feel free to let me know, it does help me refine the data.

Next version will likely be a while away, as getting this data is exhausting with my current pipeline so I'll need to go back into the mines and look at some different methods.

I had a bit more time playing around with it. I was too euphoric - yes, it still often only interprets part of the instructions till you continue gen a few times, or explicitly ask it to continue.

But it does it far less often then v2. More often than v1 though. Would have to start a completely new story to adequately compare though. In any case, v1 is not bad, really.

Next version will likely be a while away, as getting this data is exhausting with my current pipeline so I'll need to go back into the mines and look at some different methods.

That is great news in my book (... that there might be a next version! :)

Sign up or log in to comment