How Did You Convert Felladrin/Llama-160M-Chat-v1 to ONNX Format?
Hi! I’m interested in using the Felladrin/Llama-160M-Chat-v1 model with Transformers.js, which works best with ONNX models—ideally in INT8 for better performance. I was wondering how you converted the model to ONNX format (and if you used any specific tools or steps to quantize it to INT8). Could you share your conversion process or any scripts you used? I'd love to replicate it for local usage. Thanks in advance!
Hi! At the time I converted it, I used the conversion script from transformers.js repository. But since then, we’ve created this space: https://huggingface.co/spaces/onnx-community/convert-to-onnx (which also makes use of the official conversion script), and it’s straightforward. You can also clone the space locally and run it through Docker or download the files and run it directly with python. Hopefully you’ll easily convert any model to ONNX (as far as the architecture is supported by ONNX library)!
Thanks so much, @Felladrin . I need to understand how we can create decoder_model_merged.onnx. I'm using these models to run on a mobile device with React Native, but it seems that using just model.onnx gives me poor results. Is there something specific or important about these decoder models that I should be aware of?
I used the following script to convert TinyLlama/TinyLlama-1.1B-Chat-v1.0
to ONNX:
!python3 /content/convert.py \
--quantize \
--task text-generation \
--model_id TinyLlama/TinyLlama-1.1B-Chat-v1.0
After the conversion, I tried loading the quantized model, but I encountered the following error:
x.split is not a function (it is undefined)
This error come from the tokenizer.js
file:
this.bpe_ranks = new Map(config.merges.map((x, i) => [x, i]));
this.merges = config.merges.map(x => x.split(this.BPE_SPLIT_TOKEN));
It looks like config.merges
is undefined or not in the expected format.
I also tested my setup using onnx-community/TinyLlama-1.1B-Chat-v1.0-ONNX
, and that version works fine, so it seems the issue is related to the conversion process.
My goal is to fine-tune this model on my dataset, convert it to ONNX, and run it with ONNX Runtime — but I’m blocked at this issue.
Could you help clarify what might be going wrong? Am I missing a step in the conversion process?
Thanks!
Hi @Xenova ,
I'm sorry I know that I need to use the new @huggingface/transformers
, but I can't use it since I'm running these models on mobile using react native app. so I'm using
@xenova
/transformers 2.17.2
// Tokenizers >= 0.20.0 serializes BPE merges as a [string, string][] instead of a string[],
// which resolves the ambiguity for merges containing spaces.
const use_new_merge_format = Array.isArray(config.merges[0]);
/**
@type
{[string, string][]} */
this.merges = use_new_merge_format
? /**
@type
{[string, string][]} */(config.merges)
: (/**
@type
{string[]} */(config.merges)).map(x => /**
@type
{[string, string]} */(x.split(' ', 2)));
this.bpe_ranks = new Map(this.merges.map((x, i) => [JSON.stringify(x), i]));
I fixed the issue using this but now I get repetive words, only for those models which I got the x.split error. I'm not sure where is the issue now. what is the relative part for repeating. I really need to fix this issue.