hon9kon9ize
/

CantoneseLLMChat-v1.0-32B

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

CantoneseLLMChat-v1.0-32B / README.md

jed351's picture

Update README.md

52bd608 verified 26 days ago

|

3.89 kB

	---
	license: other
	library_name: transformers
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	base_model: hon9kon9ize/CantoneseLLM-v1.0-32B-cpt
	model-index:
	- name: CantoneseLLMChat-v1.0-32B
	results: []
	---


	# CantoneseLLMChat-v1.0-32B

	![front_image](cantonese_llm_v1.jpg)


	Cantonese LLM Chat v1.0 is the first generation Cantonese LLM from hon9kon9ize.
	Building upon the sucess of [v0.5 preview](https://huggingface.co/hon9kon9ize/CantoneseLLMChat-v0.5), the model excels in Hong Kong related specific knowledge and Cantonese conversation.

	## Model description
	Base model obtained via Continuous Pre-Training of [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B) with 600 millions publicaly available Hong Kong news articles and Cantonese websites.
	Instructions fine-tuned model trained with a dataset consists of 75,000 instrutions pairs. 45,000 pairs were Cantonese insturctions generated by other LLMs and reviewed by humans.

	The model trained with 16 Nvidia H100 96GB HBM2e GPUs on [Genkai Supercomputer](https://www.cc.kyushu-u.ac.jp/scp/eng/system/Genkai/hardware/).

	## Basic Usage
	```
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM
	model_id = "hon9kon9ize/CantoneseLLMChat-v1.0-32B"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	def chat(messages, temperature=0.9, max_new_tokens=200):
	input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')
	output_ids = model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature)
	response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False)
	return response
	prompt = "邊個係香港特首？"
	messages = [
	{"role": "system", "content": "you are a helpful assistant."},
	{"role": "user", "content": prompt}
	]
	print(chat(messages)) # 香港特別行政區行政長官係李家超。<\|im_end\|>
	```

	## Performance
	Best in class open source LLM in understanding Cantonese and Hong Kong culture in the [HK-Eval Benchmark](https://arxiv.org/pdf/2503.12440).
	However, as one could observe, reasoning models have performed dramatically better than their counterparts. We are currently working on reasoning models for v2.

	\| Model \| HK Culture (zero-shot) \| Cantonese Linguistics \|
	\|---------------------------\|:----------------------:\|:---------------------:\|
	\| CantonesellmChat v0.5 6B \| 52.0% \| 12.8% \|
	\| CantonesellmChat v0.5 34B \| 72.5% \| 54.5% \|
	\| CantonesellmChat v1.0 3B \| 56.0% \| 45.7% \|
	\| CantonesellmChat v1.0 7B \| 60.3% \| 46.5% \|
	\| CantonesellmChat v1.0 32B \| 69.8% \| 52.7% \|
	\| CantonesellmChat v1.0 72B \| 75.4% \| 59.6% \|
	\| Llama 3.1 8B Instruct \| 45.6% \| 35.1% \|
	\| Llama 3.1 70B Instruct \| 63.0% \| 50.3% \|
	\| Qwen2.5 7B Instruct \| 51.2% \| 30.3% \|
	\| Qwen2.5 32B Instruct \| 59.9% \| 45.1% \|
	\| Qwen2.5 72B Instruct \| 65.9% \| 45.9% \|
	\| Claude 3.5 Sonnet \| 71.7% \| 63.2% \|
	\| DeepSeek R1 \| 88.8% \| 77.5% \|
	\| Gemini 2.0 Flash \| 80.2% \| 75.3% \|
	\| Gemini 2.5 Pro \| 92.1% \| 87.3% \|
	\| GPT4o \| 77.5% \| 63.8% \|
	\| GPT4o-mini \| 55.6% \| 57.3% \|