Does the RL lead to this model to prefer to give answers in a certain length scope?
#65
by
tonyaw
- opened
I'm using "joshmiller656/Llama-3.1-Nemotron-70B-Instruct-AWQ-INT4", a quantization version of "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF".
I observed that this Nemotron model tries to control the generation length somehow.
I have a prompt that have multiple criteria to let Nemotron answer if the given data can match each criterion or not.
If the criteria list is short, Nemotron can give the match answer for each criterion, like:
- [1, 1.1.1.1]: No. xxx
- [1, 1.1.1.1.1]: No. xxx
- [1, 1.1.1.2]: No. xxx
If the criteria list is long, Nemotron prefers to use "...\n" to skip some criteria matching: - [2, 1.1.1.1]: No. xxx
- ...
- [2, 1.1.1.5]: Yes. xxx
Do you think it is my prompt's problem, or the nature of Neomtron?
The generation length(completion_tokens) is far below 4K. One example is:
prompt_tokens: 4769
completion_tokens: 1227