Description
This repo contains GGUF format model files for yentinglin/Llama-3-Taiwan-70B-Instruct.
Provided files
Name | Quant method | Bits | Size |
---|---|---|---|
llama-3-taiwan-70b-instruct-q4_k_s.gguf | Q4_K_S | 4 | 40.3 GB |
llama-3-taiwan-70b-instruct-q4_k_m.gguf | Q4_K_M | 4 | 42.5 GB |
llama-3-taiwan-70b-instruct-q5_k_s.gguf | Q5_K_S | 5 | 48.7 GB |
llama-3-taiwan-70b-instruct-q5_k_m.gguf | Q5_K_M | 5 | 49.9 GB |
llama-3-taiwan-70b-instruct-q6_k.gguf | Q6_K | 6 | 59.89 GB |
llama-3-taiwan-70b-instruct-q8_0.gguf | Q8_0 | 8 | 75 GB |
Original model card

๐ Demo Site
Try out Llama-3-Taiwan interactively at twllm.com
โ๏ธ Chatbot Arena
Participate in the exciting Chatbot Arena and compete against other chatbots!
๐ We're excited to introduce Llama-3-Taiwan-70B! Llama-3-Taiwan-70B is a 70B parameter model finetuned on a large corpus of Traditional Mandarin and English data using the Llama-3 architecture. It demonstrates state-of-the-art performance on various Traditional Mandarin NLP benchmarks.
The model was trained with NVIDIA NeMoโข Framework using the NVIDIA Taipei-1 built with NVIDIA DGX H100 systems.
The compute and data for training Llama-3-Taiwan-70B was generously sponsored by Chang Gung Memorial Hospital, Chang Chun Group, Legalsign.ai, NVIDIA, Pegatron, TechOrange, and Unimicron (in alphabetical order).
We would like to acknowledge the contributions of our data provider, team members and advisors in the development of this model, including shasha77 for high-quality YouTube scripts and study materials, Taiwan AI Labs for providing local media content, Ubitus K.K. for offering gaming content, Professor Yun-Nung (Vivian) Chen for her guidance and advisement, Wei-Lin Chen for leading our pretraining data pipeline, Tzu-Han Lin for synthetic data generation, Chang-Sheng Kao for enhancing our synthetic data quality, and Kang-Chieh Chen for cleaning instruction-following data.
Model Summary
Llama-3-Taiwan-70B is a large language model finetuned for Traditional Mandarin and English users. It has strong capabilities in language understanding, generation, reasoning, and multi-turn dialogue. Key features include:
- 70B parameters
- Languages: Traditional Mandarin (zh-tw), English (en)
- Finetuned on High-quality Traditional Mandarin and English corpus covering general knowledge as well as industrial knowledge in legal, manufacturing, medical, and electronics domains
- 8K context length
- Open model released under the Llama-3 license
Training Details
- Training Framework: NVIDIA NeMo, NVIDIA NeMo Megatron
- Inference Framework: NVIDIA TensorRT-LLM
- Base model: Llama-3 70B
- Hardware: NVIDIA DGX H100 on Taipei-1
- Context length: 8K tokens (128k version)
- Batch size: 2M tokens per step
Evaluation
Checkout Open TW LLM Leaderboard for full and updated list.
Model | TMLU | Taiwan Truthful QA | Legal Eval | TW MT-Bench | Long context | Function Calling | TMMLU+ |
---|---|---|---|---|---|---|---|
ๅญธ็ง็ฅ่ญ | ๅฐ็ฃๅจๅฐๅๆธฌ่ฉฆ | ๅฐ็ฃๆณๅพ่้ก | ไธญๆๅค่ผชๅฐ็ญ | ้ทๆๆฌๆฏๆด | ๅฝๆธๅผๅซ | ||
yentinglin/Llama-3-Taiwan-70B-Instruct | 74.76% | 80.95% | 68.42% | 7.54 | 128k version | โ | 67.53% |
yentinglin/Llama-3-Taiwan-70B-Instruct-DPO | 74.60% | 81.75% | 70.33% | - | - | โ | - |
yentinglin/Llama-3-Taiwan-70B-Instruct-128k | 73.01% | 80.16% | 63.64% | - | - | โ | - |
yentinglin/Llama-3-Taiwan-8B-Instruct | 59.50% | 61.11% | 53.11% | 7.21 | 128k version | โ | 52.28% |
yentinglin/Llama-3-Taiwan-8B-Instruct-DPO | 59.88% | 59.52% | 52.63% | - | - | โ | - |
yentinglin/Llama-3-Taiwan-8B-Instruct-128k | - | - | - | - | - | โ | - |
Claude-3-Opus | 73.59% (5-shot) | 69.84% | 60.29% | - | 200k | โ | - |
GPT4-o | 65.56% (0-shot), 69.88% (5-shot) | 76.98% | 53.59% | - | 128k | โ | - |
GPT4-turbo | 70.42% (5-shot) | - | - | - | 128k | โ | 60.34%^ |
Gemini-Pro | 61.40% (5-shot) | - | - | - | 1000k | โ | 49.92%^ |
GPT-3.5-turbo-1106 | 49.37% (5-shot) | - | - | 7.1 | 128k | โ | 41.76%^ |
Qwen1.5-110B-Chat | 75.69% | 66.67% | 49.28% | - | 32k | โ | 65.81% |
Yi-34B-Chat | 73.59% | 71.43% | 55.02% | 6.9 | 200k | โ | 64.10% |
Meta-Llama-3-70B-Instruct | 70.95% | 65.08% | 52.63% | - | 8k | โ | 62.75% |
Mixtral-8x22B-Instruct-v0.1 | 55.57% | 52.38% | 44.98% | - | 64k | โ | 52.16% |
Breexe-8x7B-Instruct-v0_1 | - | - | - | 7.2 | 8k | โ | 48.92% |
c4ai-command-r-plus | 62.87% | 64.29% | 34.45% | - | 128k | โ | 49.75% |
Meta-Llama-3-8B-Instruct | 55.81% | 46.83% | 35.89% | - | 8k | โ | 43.38% |
Breeze-7B-Instruct-v1_0 | 55.57% | 52.38% | 39.23% | 6.0 | 32k | โ | 41.77% |
Llama3-TAIDE-LX-8B-Chat-Alpha1 | 47.30% | 50.79% | 37.80% | - | 8k | โ | 39.03% |
Phi-3-mini-4k-instruct | 40.97% | 37.30% | 27.27% | - | 4k | โ | 33.02% |
Numbers are 0-shot by default.
^ taken the closet matching numbers from original dataset.
Needle in a Haystack Evaluation
The "Needle in a ๅบๅธซ่กจ" evaluation tests the model's ability to locate and recall important information embedded within a large body of text, using the classic Chinese text ใๅบๅธซ่กจใ by ่ซธ่ไบฎ.
To run the evaluation, use the script.
TW MT-Bench Score
- Average Score: 7.5375
- Maximum Score: 10
- Minimum Score: 1
- Median Score: 9.0
- Standard Deviation: 3.0349783771882133
- Total Number of Scores: 160
- Model resopnse
- GPT-4 Eval
- Code fork from
mtkresearch/TCEval
with bug fixing
Use Cases
Llama-3-Taiwan-70B can be applied to a wide variety of NLP tasks in Traditional Chinese and English, including:
1. ๅค่ผชๅฐ่ฉฑ
System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project. User: ๅจ๏ผไฝ ๅฅฝ๏ผ Assistant: ไฝ ๅฅฝ๏ผๆไปๅคฉๅฏไปฅๅฆไฝๅๅฉไฝ ๏ผ User: ๅฐฑๆณ่ๅคฉ่ๅทฒ.... Assistant: ็ถ็ถ๏ผๆๅพๆจๆๅไฝ ่ๅคฉ๏ผๆไป้บผ็นๅฅๆณ่ซ่ซ็ๅ๏ผ
2. RAG๏ผๆชข็ดขๅขๅผท็ๆ๏ผ
Demo: ๅฏไปฅๆ้ Search Web
on twllm.com
System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project. User: ๆๅคฉๅฐๅคงๅคฉๆฐฃๅฆไฝ๏ผ Assistant: ๆ นๆๅคฉๆฐฃ้ ๅ ฑ๏ผๆๅคฉๅฐๅคงๅคฉๆฐฃ็บ้ฃ้จใไฝๆบซ 24ยฐCใ10 ๅฐ 15 ๅ ฌ้/ๅฐๆ ็ ๆฑ ้ขจ ใ้้จๆฉ็็บ 40%ใ
3. ๆ ผๅผๅ่ผธๅบ, ่ช่จ็่งฃ, ๅฏฆ้ซ่ญๅฅ, ๅฝๆธ่ชฟ็จ
If you are interested in function-calling, I strongly recommend using constrained decoding to turn on json mode
.
Example from HW7 in INTRODUCTION TO GENERATIVE AI 2024 SPRING
from HUNG-YI LEE (ๆๅฎๆฏ
)
System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project. User: ๆจๆฏ้ปๅฝฑ่ฉ่ซๅๆๅธซ๏ผ่ฒ ่ฒฌๆ ็ทๅๆใ ๅๆๆฏๅ่ขซ่ฉ่ซ็ๅฅๅญ๏ผ่ฉไผฐๅฅๅญไธญๆฏๅๅญๅๆจ้ป็ฌฆ่็้่ฆๆง๏ผ ไธฆไธๅฐ็ตๆไฝฟ็จ tuple ็็ตๆงๅๅบใ tuple ๅผ็็ฏๅไปๆผ -1 (่ฒ ๅ) ๅฐ 1 (ๆญฃๅ) ไน้๏ผ ็ถๅพ๏ผๅฐ่ฉ่ซๅ้ก็บๆญฃๅ (1) ๆ่ฒ ๅ (-1)ใ ้ไบ้่ฆ่ขซ่ฉ่ซ็ๅฅๅญ๏ผๅ ๅซๅจ ๆจ็ฑคไธญ้ใ ่ผธๅบ็ฏไพ: [(<ๅฎๅญๆๆจ้ป็ฌฆ่>, ), ...] ๆ นๆๅคๅชๅ ฑๅฐ๏ผๅฎ้ฝๆๆฏๅจๅปๅนด3ๆ่ๅฐ็ฃๆทไบค๏ผไธฆ่ฝๅพ่ไธญๅๅปบไบคไนๅพ๏ผๅป้ฒ้ฒ็ญไธๅฐไธญๅ็้ๆด๏ผ่ซๅฅฝ่ฆ้ฒๅฃๅฎ้ฝๆๆฏ็็ฝ่ฆไนๆๅปถ่จฑไน ๏ผ็่ณ้ๅฐ็ฝ่ฆๅนๆ ผๆฎบๅน่ณ้ๅพ่ๅฐ็ฃไบคๆ็ๅๅน๏ผๅๅ ไธๅขจ่ฅฟๅฅๅๆไนไธญๆทไบ่ๅฎๅ็็ฝ่ฆ่ฒฟๆ๏ผๅฐ็ถๅฐ่ฆ่พฒๅฏ่ชชๆฏ้ชไธๅ ้ใ Assistant: [('ๆ นๆ', 0.2), ('ๅคๅช', 0.1), ('ๅ ฑๅฐ', -0.1), (',', 0.05), ('ๅฎ้ฝๆๆฏ', -0.3), ('ๅจๅปๅนด3ๆ่ๅฐ็ฃๆทไบค๏ผ ', -0.5), ('ไธฆ่ฝๅพ่ไธญๅๅปบไบคไนๅพ๏ผ ', -0.4), ('ๅป้ฒ้ฒ็ญไธๅฐไธญๅ็้ๆด๏ผ ', -1), ('่ซๅฅฝ่ฆ้ฒๅฃๅฎ้ฝๆๆฏ็็ฝ่ฆไนๆๅปถ่จฑไน ๏ผ ', -1), (',็่ณ้ๅฐ็ฝ่ฆๅนๆ ผๆฎบๅน่ณ้ๅพ่ๅฐ็ฃไบคๆ็ๅๅน๏ผๅๅ ไธๅขจ่ฅฟๅฅๅๆไนไธญๆทไบ่ๅฎๅ็็ฝ่ฆ่ฒฟๆ๏ผๅฐ็ถๅฐ่ฆ่พฒๅฏ่ชชๆฏ้ชไธๅ ้ใ ',-1)] -1
Get Started
Caveat: System message should always be set.
Hugging Face Transformers library
You can use Llama-3-Taiwan-70B with the Hugging Face Transformers library:
import torch
from transformers import pipeline, StoppingCriteria
# Define a custom stopping criteria class
class EosListStoppingCriteria(StoppingCriteria):
def __init__(self, eos_sequence=[128256]):
self.eos_sequence = eos_sequence
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
last_ids = input_ids[:, -len(self.eos_sequence):].tolist()
return self.eos_sequence in last_ids
# Initialize the model with automatic device mapping
llm = pipeline("text-generation", model="yentinglin/Llama-3-Taiwan-70B-Instruct-rc1", device_map="auto")
tokenizer = llm.tokenizer
# Define a conversation example
chat = [
{"role": "system", "content": "You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project."},
{"role": "user", "content": "ไฝ ๅฅฝ๏ผ่ซๅไฝ ๅฏไปฅๅฎๆไป้บผไปปๅ๏ผ"},
{"role": "assistant", "content": "ไฝ ๅฅฝ๏ผๆๅฏไปฅๅนซๅฉๆจ่งฃๆฑบๅ็จฎๅ้กใๆไพ่ณ่จไธฆๅๅฉๅฎๆๅค็จฎไปปๅใไพๅฆ๏ผๅ็ญๆ่กๅ้กใๆไพๅปบ่ญฐใ็ฟป่ญฏๆๅญใๅฐๆพ่ณๆๆๅๅฉๆจๅฎๆ่ก็จ็ญใ่ซๅ่จดๆๅฆไฝ่ฝๅนซๅฉๆจใ"},
{"role": "user", "content": "ๅคชๆฃไบ๏ผ"}
]
flatten_chat_for_generation = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
"""
<|im_start|>user
You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.<|im_end|>
<|im_start|>user
ไฝ ๅฅฝ๏ผ่ซๅไฝ ๅฏไปฅๅฎๆไป้บผไปปๅ๏ผ<|im_end|>
<|im_start|>assistant
ไฝ ๅฅฝ๏ผๆๅฏไปฅๅนซๅฉๆจ่งฃๆฑบๅ็จฎๅ้กใๆไพ่ณ่จๅๅๅฉๆจๅฎๆ่จฑๅคไธๅ็ไปปๅใไพๅฆ๏ผๅ็ญๆ่กๅ้กใๆไพๅปบ่ญฐใ็ฟป่ญฏๆๅญใๅฐๆพ่ณๆๆๅๅฉๆจๅฎๆ่ก็จ็ญใ่ซๅ่จดๆๅฆไฝ่ฝๅนซๅฉๆจใ<|im_end|>
<|im_start|>user
ๅคชๆฃไบ๏ผ<|im_end|>
<|im_start|>assistant
"""
# Generate a response using the custom stopping criteria
output = llm(flatten_chat_for_generation, return_full_text=False, max_new_tokens=128, top_p=0.9, temperature=0.7, stopping_criteria=[EosListStoppingCriteria([tokenizer.eos_token_id])])
print(output[0]['generated_text'])
"่ฌ่ฌ๏ผๅพ้ซ่่ฝๅค ็บๆจๆๅใๅฆๆๆไปปไฝๅ
ถไป้่ฆๅๅฉ็ๅฐๆน๏ผ่ซ้จๆ่ๆ่ฏ็นซใๆๆ็กๆๅคงๅชๅ็บๆจๆไพๆ้็ๆฏๆดใ"
vLLM
Start the server
export NUM_GPUS=4
export PORT=8000
docker run \
-e HF_TOKEN=$HF_TOKEN \
--gpus '"device=0,1,2,3"' \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p "${PORT}:8000" \
--ipc=host \
vllm/vllm-openai:v0.4.0.post1 \
--model "yentinglin/Llama-3-Taiwan-70B-Instruct-rc1" \
-tp "${NUM_GPUS}"
Sample client code, or you can use anything OpenAI-API compatible clients
# pip install "openai>=1.0.0"
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
chat_response = client.chat.completions.create(
model="yentinglin/Llama-3-Taiwan-70B-Instruct-rc1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."},
]
)
print("Chat response:", chat_response)
Enjoy exploring the capabilities of Llama-3-Taiwan-70B! We look forward to seeing what you create with this powerful open-source model. If you have any questions or feedback, please let us know.
Contributions
- Professor Yun-Nung (Vivian) Chen, for her guidance and advisement throughout the project.
- Wei-Lin Chen, for leading our pretraining data pipeline.
- Tzu-Han Lin, for synthetic data generation.
- Chang-Sheng Kao, for enhancing our synthetic data quality.
- Kang-Chieh Chen, for cleaning instruction-following data.
- Min-Yi Chen and Shao-Heng Hsu, for collecting chemical engineering data and benchmarks.
- Chung-Yao Ma, Jonathan Guo and Kai-Chun Chang, for collecting manufacturing and electrical engineering data and benchmarks, and project progress management
Citation
@article{DBLP:journals/corr/abs-2311-17487,
author = {Yen{-}Ting Lin and
Yun{-}Nung Chen},
title = {Taiwan {LLM:} Bridging the Linguistic Divide with a Culturally Aligned
Language Model},
journal = {CoRR},
volume = {abs/2311.17487},
year = {2023},
url = {https://doi.org/10.48550/arXiv.2311.17487},
doi = {10.48550/ARXIV.2311.17487},
eprinttype = {arXiv},
eprint = {2311.17487},
timestamp = {Tue, 05 Dec 2023 14:40:42 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2311-17487.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/abs-2403-20180,
author = {Po{-}Heng Chen and
Sijia Cheng and
Wei{-}Lin Chen and
Yen{-}Ting Lin and
Yun{-}Nung Chen},
title = {Measuring Taiwanese Mandarin Language Understanding},
journal = {CoRR},
volume = {abs/2403.20180},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2403.20180},
doi = {10.48550/ARXIV.2403.20180},
eprinttype = {arXiv},
eprint = {2403.20180},
timestamp = {Wed, 10 Apr 2024 17:37:45 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2403-20180.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
- Downloads last month
- 295
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for chienweichang/Llama-3-Taiwan-70B-Instruct-GGUF
Base model
meta-llama/Meta-Llama-3-70B