Text Generation
Transformers
Safetensors
English
llama
nvidia
llama3.1
conversational
text-generation-inference

Does the RL lead to this model to prefer to give answers in a certain length scope?

#65
by tonyaw - opened

I'm using "joshmiller656/Llama-3.1-Nemotron-70B-Instruct-AWQ-INT4", a quantization version of "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF".
I observed that this Nemotron model tries to control the generation length somehow.
I have a prompt that have multiple criteria to let Nemotron answer if the given data can match each criterion or not.
If the criteria list is short, Nemotron can give the match answer for each criterion, like:

  • [1, 1.1.1.1]: No. xxx
  • [1, 1.1.1.1.1]: No. xxx
  • [1, 1.1.1.2]: No. xxx
    If the criteria list is long, Nemotron prefers to use "...\n" to skip some criteria matching:
  • [2, 1.1.1.1]: No. xxx
  • ...
  • [2, 1.1.1.5]: Yes. xxx

Do you think it is my prompt's problem, or the nature of Neomtron?
The generation length(completion_tokens) is far below 4K. One example is:
prompt_tokens: 4769
completion_tokens: 1227

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment