Model Description
Our Llama-3.2-1B-Instruct-uz (experimental) model has been continually pretrained with context length of 2048 tokens, on 1.2B tokens (80% English, 20% Uzbek), then SFT fine-tuned. Our customized tokenizer averages 1.7 tokens per Uzbek word vs. ~3.5 in the original Llama models, meaning 2x faster inference and longer effective context length on Uzbek text. You’ll be able to run this model on just 2 GB of VRAM (with quantization), perfect for small GPUs, edge devices, or even mobile scenarios.
Benchmarks
Model | BLEU Uz→En (Zero_shot) | BLEU En→Uz (Zero_shot) | COMET Uz→En | COMET En→Uz | Uzbek Sentiment Analysis | Uzbek News Classification | MMLU (English) (Zero_shot) |
---|---|---|---|---|---|---|---|
Llama-3.2 1B Instruct | 3.62 | 0.44 | 56.72 | 35.52 | 54.77 | 42.16 | 38.15 |
Llama-3.2 1B Instruct Uz | 10.33 | 5.29 | 74.39 | 72.34 | 65.25 | 17.14 | 27.20 |
Llama-3.2 3B Instruct | 11.91 | 2.54 | 71.96 | 55.62 | 56.01 | 70.60 | 52.04 |
Llama-3.2 3B Instruct Uz | 20.47 | 9.18 | 83.20 | 80.71 | 77.55 | 41.43 | 45.91 |
Llama-3.1 8B Instruct | 24.23 | 8.28 | 83.12 | 82.22 | 69.77 | 73.63 | 60.59 |
The results show that our Uzbek-optimized models consistently outperform their base counterparts in translation benchmarks (BLEU and COMET) on the FLORES+ Uz-En / En-Uz evaluation datasets and sentiment analysis in Uzbek language. Also, on the MMLU benchmark, which measures general language understanding across multiple tasks in English, and News classification tasks, our Uzbek optimized model showed slight decline because of catastrophic forgetting of original English instruction following. (The official Llama model’s MMLU score may differ from our score due to our evaluation method. Refer to the links below to see evaluation details.)
Looking ahead, these models are only experimental checkpoints with a room for improvement. We’re eager to see how these models will contribute to Uzbek open-source and be used by our Uzbek 🇺🇿 community. 🚀
How to use
The Llama-3.2-1B-Instruct-uz model can be used with transformers in the following way. We recommend preprocessing Uzbek input to replace apostrophe (') with sequence (APST) to achieve our model's lower tokenizer fertility.
Use with transformers
import re, torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import langid
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
DTYPE = torch.bfloat16
MODEL_ID = "bxod/Llama-3.2-1B-Instruct-uz"
PATTERN = r"[’‘‚‛ʻʼʽʾʿˈˊˋˌˍ'\']"
tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
tok.padding_side = "left"
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=DTYPE,
device_map="auto"
)
EOT = "<|eot_id|>"
SYSTEM = (
f"{tok.bos_token}<|start_header_id|>system<|end_header_id|>\n"
"You are a helpful assistant<|eot_id|>"
)
def prompt(user: str) -> str:
return (
SYSTEM +
"<|start_header_id|>user<|end_header_id|>\n" +
f"{user}{EOT}" +
"<|start_header_id|>assistant<|end_header_id|>"
)
def generate(user: str, max_new: int = 256) -> str:
lang, confidence = langid.classify(user)
clean_text = re.sub(PATTERN, "APST", user) if lang != "en" else user
enc = tok(prompt(clean_text), return_tensors="pt").to(DEVICE)
out = model.generate(**enc,
max_new_tokens=max_new,
bos_token_id=tok.bos_token_id,
eos_token_id=tok.convert_tokens_to_ids(EOT),
pad_token_id=tok.pad_token_id,
do_sample=False)
txt = tok.decode(out[0], skip_special_tokens=False)
txt = txt.split("<|start_header_id|>assistant<|end_header_id|>", 1)[1]
return txt.split(EOT, 1)[0].replace("APST", "'").strip()
print(generate("Menga Alisher Navoiy haqida aytib ber."))
Information on Evaluation Method
To evaluate on the translation task, we used FLORES+ Uz-En / En-Uz datasets. We used the following prompt to do zero-shot Uz-En evaluation both for the base model and Uzbek-optimized model (for En-Uz eval, we changed the positions of the words "English" and "Uzbek").
prompt = f"Input: {clean_text} \n\nYour task is to accurately translate the given Uzbek text into English.\n"
"Output only the English translation, without any additional comments.\n"
"\nPlease translate the following Uzbek text into English."
To assess the model's ability in Uzbek sentiment analysis, we used the risqaliyevds/uzbek-sentiment-analysis dataset (refer to behbudiy/uzbek-sentiment-analysis dataset). We used the following prompt for the evaluation:
prompt = f'''Input: {clean_text} \n\nGiven the following text, determine the sentiment as either 'Positive' or 'Negative.' Respond with only the word 'Positive' or 'Negative' without any additional text or explanation."
'''
For Uzbek News Classification, we used risqaliyevds/uzbek-zero-shot-classification dataset and asked the model to predict the category of the news using the following prompt:
prompt = f'''Input: {clean_text}\n\nClassify the given news article in Uzbek.
0 - Siyosat - If the text is about politics.
1 - Iqtisodiyot - If the text is about the economy.
2 - Texnologiya - If the text is about technology.
3 - Sport - If the text is about sports.
4 - Madaniyat - If the text is about culture.
5 - Salomatlik - If the text is about health.
6 - Oila va Jamiyat - If the text is about family and society.
7 - TaAPSTlim - If the text is about education.
8 - Ekologiya - If the text is about ecology.
9 - Xorijiy Yangiliklar - If the text is about foreign news.
Print only one digit ID of the corresponding class.
'''
On MMLU, we performed 0-shot evaluation using the following template and extracted the first token generated by the model for measuring accuracy:
template = "Given the above question and choices, choose the single best answer (A, B, C, or D). Respond with only one letter..
More
For more details and examples, refer to the base model below: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
- Downloads last month
- 60
Model tree for bxod/Llama-3.2-1B-Instruct-uz
Base model
meta-llama/Llama-3.2-1B-Instruct