Description

This repo contains GGUF format model files for yentinglin/Llama-3-Taiwan-70B-Instruct.

Provided files

Original model card

Taiwan LLM Logo

๐Ÿš€ Demo Site

Try out Llama-3-Taiwan interactively at twllm.com

โš”๏ธ Chatbot Arena

Participate in the exciting Chatbot Arena and compete against other chatbots!

๐Ÿš€ We're excited to introduce Llama-3-Taiwan-70B! Llama-3-Taiwan-70B is a 70B parameter model finetuned on a large corpus of Traditional Mandarin and English data using the Llama-3 architecture. It demonstrates state-of-the-art performance on various Traditional Mandarin NLP benchmarks.

The model was trained with NVIDIA NeMoโ„ข Framework using the NVIDIA Taipei-1 built with NVIDIA DGX H100 systems.

The compute and data for training Llama-3-Taiwan-70B was generously sponsored by Chang Gung Memorial Hospital, Chang Chun Group, Legalsign.ai, NVIDIA, Pegatron, TechOrange, and Unimicron (in alphabetical order).

We would like to acknowledge the contributions of our data provider, team members and advisors in the development of this model, including shasha77 for high-quality YouTube scripts and study materials, Taiwan AI Labs for providing local media content, Ubitus K.K. for offering gaming content, Professor Yun-Nung (Vivian) Chen for her guidance and advisement, Wei-Lin Chen for leading our pretraining data pipeline, Tzu-Han Lin for synthetic data generation, Chang-Sheng Kao for enhancing our synthetic data quality, and Kang-Chieh Chen for cleaning instruction-following data.

Model Summary

Llama-3-Taiwan-70B is a large language model finetuned for Traditional Mandarin and English users. It has strong capabilities in language understanding, generation, reasoning, and multi-turn dialogue. Key features include:

  • 70B parameters
  • Languages: Traditional Mandarin (zh-tw), English (en)
  • Finetuned on High-quality Traditional Mandarin and English corpus covering general knowledge as well as industrial knowledge in legal, manufacturing, medical, and electronics domains
  • 8K context length
  • Open model released under the Llama-3 license

Training Details

Evaluation

Checkout Open TW LLM Leaderboard for full and updated list.

Model TMLU Taiwan Truthful QA Legal Eval TW MT-Bench Long context Function Calling TMMLU+
ๅญธ็ง‘็Ÿฅ่ญ˜ ๅฐ็ฃๅœจๅœฐๅŒ–ๆธฌ่ฉฆ ๅฐ็ฃๆณ•ๅพ‹่€ƒ้กŒ ไธญๆ–‡ๅคš่ผชๅฐ็ญ” ้•ทๆ–‡ๆœฌๆ”ฏๆด ๅ‡ฝๆ•ธๅ‘ผๅซ
yentinglin/Llama-3-Taiwan-70B-Instruct 74.76% 80.95% 68.42% 7.54 128k version โœ… 67.53%
yentinglin/Llama-3-Taiwan-70B-Instruct-DPO 74.60% 81.75% 70.33% - - โœ… -
yentinglin/Llama-3-Taiwan-70B-Instruct-128k 73.01% 80.16% 63.64% - - โœ… -
yentinglin/Llama-3-Taiwan-8B-Instruct 59.50% 61.11% 53.11% 7.21 128k version โœ… 52.28%
yentinglin/Llama-3-Taiwan-8B-Instruct-DPO 59.88% 59.52% 52.63% - - โœ… -
yentinglin/Llama-3-Taiwan-8B-Instruct-128k - - - - - โœ… -
Claude-3-Opus 73.59% (5-shot) 69.84% 60.29% - 200k โœ… -
GPT4-o 65.56% (0-shot), 69.88% (5-shot) 76.98% 53.59% - 128k โœ… -
GPT4-turbo 70.42% (5-shot) - - - 128k โœ… 60.34%^
Gemini-Pro 61.40% (5-shot) - - - 1000k โœ… 49.92%^
GPT-3.5-turbo-1106 49.37% (5-shot) - - 7.1 128k โœ… 41.76%^
Qwen1.5-110B-Chat 75.69% 66.67% 49.28% - 32k โœ… 65.81%
Yi-34B-Chat 73.59% 71.43% 55.02% 6.9 200k โœ… 64.10%
Meta-Llama-3-70B-Instruct 70.95% 65.08% 52.63% - 8k โœ… 62.75%
Mixtral-8x22B-Instruct-v0.1 55.57% 52.38% 44.98% - 64k โœ… 52.16%
Breexe-8x7B-Instruct-v0_1 - - - 7.2 8k โ“ 48.92%
c4ai-command-r-plus 62.87% 64.29% 34.45% - 128k โœ… 49.75%
Meta-Llama-3-8B-Instruct 55.81% 46.83% 35.89% - 8k โœ… 43.38%
Breeze-7B-Instruct-v1_0 55.57% 52.38% 39.23% 6.0 32k โ“ 41.77%
Llama3-TAIDE-LX-8B-Chat-Alpha1 47.30% 50.79% 37.80% - 8k โ“ 39.03%
Phi-3-mini-4k-instruct 40.97% 37.30% 27.27% - 4k โ“ 33.02%

Numbers are 0-shot by default.

Eval implementation

^ taken the closet matching numbers from original dataset.

Needle in a Haystack Evaluation

The "Needle in a ๅ‡บๅธซ่กจ" evaluation tests the model's ability to locate and recall important information embedded within a large body of text, using the classic Chinese text ใ€Šๅ‡บๅธซ่กจใ€‹ by ่ซธ่‘›ไบฎ.

To run the evaluation, use the script.

TW MT-Bench Score

image/png

Use Cases

Llama-3-Taiwan-70B can be applied to a wide variety of NLP tasks in Traditional Chinese and English, including:

1. ๅคš่ผชๅฐ่ฉฑ
System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.

User: ๅ—จ๏ผŒไฝ ๅฅฝ๏ผ

Assistant: ไฝ ๅฅฝ๏ผๆˆ‘ไปŠๅคฉๅฏไปฅๅฆ‚ไฝ•ๅ”ๅŠฉไฝ ๏ผŸ

User: ๅฐฑๆƒณ่Šๅคฉ่€Œๅทฒ....

Assistant: ็•ถ็„ถ๏ผŒๆˆ‘ๅพˆๆจ‚ๆ„ๅ’Œไฝ ่Šๅคฉ๏ผๆœ‰ไป€้บผ็‰นๅˆฅๆƒณ่ซ‡่ซ–็š„ๅ—Ž๏ผŸ
2. RAG๏ผˆๆชข็ดขๅขžๅผท็”Ÿๆˆ๏ผ‰

Demo: ๅฏไปฅๆ‰“้–‹ Search Web on twllm.com

image/png

image/png

System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.

User: ๆ˜Žๅคฉๅฐๅคงๅคฉๆฐฃๅฆ‚ไฝ•๏ผŸ

Assistant: ๆ นๆ“šๅคฉๆฐฃ้ ๅ ฑ๏ผŒๆ˜Žๅคฉๅฐๅคงๅคฉๆฐฃ็‚บ้™ฃ้›จใ€‚ไฝŽๆบซ 24ยฐCใ€‚10 ๅˆฐ 15 ๅ…ฌ้‡Œ/ๅฐๆ™‚ ็š„ ๆฑ ้ขจ ใ€‚้™้›จๆฉŸ็އ็‚บ 40%ใ€‚
3. ๆ ผๅผๅŒ–่ผธๅ‡บ, ่ชž่จ€็†่งฃ, ๅฏฆ้ซ”่ญ˜ๅˆฅ, ๅ‡ฝๆ•ธ่ชฟ็”จ

If you are interested in function-calling, I strongly recommend using constrained decoding to turn on json mode.

Example from HW7 in INTRODUCTION TO GENERATIVE AI 2024 SPRING from HUNG-YI LEE (ๆŽๅฎๆฏ…)

image/png

System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.

User: ๆ‚จๆ˜ฏ้›ปๅฝฑ่ฉ•่ซ–ๅˆ†ๆžๅธซ๏ผŒ่ฒ ่ฒฌๆƒ…็ท’ๅˆ†ๆžใ€‚
ๅˆ†ๆžๆฏๅ€‹่ขซ่ฉ•่ซ–็š„ๅฅๅญ๏ผŒ่ฉ•ไผฐๅฅๅญไธญๆฏๅ€‹ๅญ—ๅ’Œๆจ™้ปž็ฌฆ่™Ÿ็š„้‡่ฆๆ€ง๏ผŒ
ไธฆไธ”ๅฐ‡็ตๆžœไฝฟ็”จ tuple ็š„็ตๆง‹ๅˆ—ๅ‡บใ€‚
tuple ๅ€ผ็š„็ฏ„ๅœไป‹ๆ–ผ -1 (่ฒ ๅ‘) ๅˆฐ 1 (ๆญฃๅ‘) ไน‹้–“๏ผŒ
็„ถๅพŒ๏ผŒๅฐ‡่ฉ•่ซ–ๅˆ†้กž็‚บๆญฃๅ‘ (1) ๆˆ–่ฒ ๅ‘ (-1)ใ€‚
้€™ไบ›้œ€่ฆ่ขซ่ฉ•่ซ–็š„ๅฅๅญ๏ผŒๅŒ…ๅซๅœจ  ๆจ™็ฑคไธญ้–“ใ€‚

่ผธๅ‡บ็ฏ„ไพ‹:
[(<ๅ–ฎๅญ—ๆˆ–ๆจ™้ปž็ฌฆ่™Ÿ>, ), ...]

 ๆ นๆ“šๅค–ๅช’ๅ ฑๅฐŽ๏ผŒๅฎ้ƒฝๆ‹‰ๆ–ฏๅœจๅŽปๅนด3ๆœˆ่ˆ‡ๅฐ็ฃๆ–ทไบค๏ผŒไธฆ่ฝ‰ๅพ€่ˆ‡ไธญๅœ‹ๅปบไบคไน‹ๅพŒ๏ผŒๅป้ฒ้ฒ็ญ‰ไธๅˆฐไธญๅœ‹็š„้‡‘ๆด๏ผŒ่ซ‡ๅฅฝ่ฆ้€ฒๅฃๅฎ้ƒฝๆ‹‰ๆ–ฏ็š„็™ฝ่ฆไนŸๆ‹–ๅปถ่จฑไน…๏ผŒ็”š่‡ณ้‚„ๅฐ‡็™ฝ่ฆๅƒนๆ ผๆฎบๅƒน่‡ณ้Žๅพ€่ˆ‡ๅฐ็ฃไบคๆ˜“็š„ๅŠๅƒน๏ผŒๅ†ๅŠ ไธŠๅขจ่ฅฟๅ“ฅๅŒๆ™‚ไนŸไธญๆ–ทไบ†่ˆ‡ๅฎๅœ‹็š„็™ฝ่ฆ่ฒฟๆ˜“๏ผŒๅฐ็•ถๅœฐ่ฆ่พฒๅฏ่ชชๆ˜ฏ้›ชไธŠๅŠ ้œœใ€‚ 
  

Assistant: [('ๆ นๆ“š', 0.2), ('ๅค–ๅช’', 0.1), ('ๅ ฑๅฐŽ', -0.1), (',', 0.05), ('ๅฎ้ƒฝๆ‹‰ๆ–ฏ', -0.3), ('ๅœจๅŽปๅนด3ๆœˆ่ˆ‡ๅฐ็ฃๆ–ทไบค๏ผŒ ', -0.5), ('ไธฆ่ฝ‰ๅพ€่ˆ‡ไธญๅœ‹ๅปบไบคไน‹ๅพŒ๏ผŒ ', -0.4), ('ๅป้ฒ้ฒ็ญ‰ไธๅˆฐไธญๅœ‹็š„้‡‘ๆด๏ผŒ ', -1),
('่ซ‡ๅฅฝ่ฆ้€ฒๅฃๅฎ้ƒฝๆ‹‰ๆ–ฏ็š„็™ฝ่ฆไนŸๆ‹–ๅปถ่จฑไน…๏ผŒ ', -1),
(',็”š่‡ณ้‚„ๅฐ‡็™ฝ่ฆๅƒนๆ ผๆฎบๅƒน่‡ณ้Žๅพ€่ˆ‡ๅฐ็ฃไบคๆ˜“็š„ๅŠๅƒน๏ผŒๅ†ๅŠ ไธŠๅขจ่ฅฟๅ“ฅๅŒๆ™‚ไนŸไธญๆ–ทไบ†่ˆ‡ๅฎๅœ‹็š„็™ฝ่ฆ่ฒฟๆ˜“๏ผŒๅฐ็•ถๅœฐ่ฆ่พฒๅฏ่ชชๆ˜ฏ้›ชไธŠๅŠ ้œœใ€‚ ',-1)]
-1

Get Started

Caveat: System message should always be set.

Hugging Face Transformers library

You can use Llama-3-Taiwan-70B with the Hugging Face Transformers library:

import torch
from transformers import pipeline, StoppingCriteria

# Define a custom stopping criteria class
class EosListStoppingCriteria(StoppingCriteria):
    def __init__(self, eos_sequence=[128256]):
        self.eos_sequence = eos_sequence

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        last_ids = input_ids[:, -len(self.eos_sequence):].tolist()
        return self.eos_sequence in last_ids

# Initialize the model with automatic device mapping
llm = pipeline("text-generation", model="yentinglin/Llama-3-Taiwan-70B-Instruct-rc1", device_map="auto")
tokenizer = llm.tokenizer

# Define a conversation example
chat = [
    {"role": "system", "content": "You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project."},
    {"role": "user", "content": "ไฝ ๅฅฝ๏ผŒ่ซ‹ๅ•ไฝ ๅฏไปฅๅฎŒๆˆไป€้บผไปปๅ‹™๏ผŸ"},
    {"role": "assistant", "content": "ไฝ ๅฅฝ๏ผŒๆˆ‘ๅฏไปฅๅนซๅŠฉๆ‚จ่งฃๆฑบๅ„็จฎๅ•้กŒใ€ๆไพ›่ณ‡่จŠไธฆๅ”ๅŠฉๅฎŒๆˆๅคš็จฎไปปๅ‹™ใ€‚ไพ‹ๅฆ‚๏ผšๅ›ž็ญ”ๆŠ€่ก“ๅ•้กŒใ€ๆไพ›ๅปบ่ญฐใ€็ฟป่ญฏๆ–‡ๅญ—ใ€ๅฐ‹ๆ‰พ่ณ‡ๆ–™ๆˆ–ๅ”ๅŠฉๆ‚จๅฎ‰ๆŽ’่กŒ็จ‹็ญ‰ใ€‚่ซ‹ๅ‘Š่จดๆˆ‘ๅฆ‚ไฝ•่ƒฝๅนซๅŠฉๆ‚จใ€‚"},
    {"role": "user", "content": "ๅคชๆฃ’ไบ†๏ผ"}
]
flatten_chat_for_generation = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
"""
<|im_start|>user
You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.<|im_end|>
<|im_start|>user
ไฝ ๅฅฝ๏ผŒ่ซ‹ๅ•ไฝ ๅฏไปฅๅฎŒๆˆไป€้บผไปปๅ‹™๏ผŸ<|im_end|>
<|im_start|>assistant
ไฝ ๅฅฝ๏ผŒๆˆ‘ๅฏไปฅๅนซๅŠฉๆ‚จ่งฃๆฑบๅ„็จฎๅ•้กŒใ€ๆไพ›่ณ‡่จŠๅ’Œๅ”ๅŠฉๆ‚จๅฎŒๆˆ่จฑๅคšไธๅŒ็š„ไปปๅ‹™ใ€‚ไพ‹ๅฆ‚๏ผšๅ›ž็ญ”ๆŠ€่ก“ๅ•้กŒใ€ๆไพ›ๅปบ่ญฐใ€็ฟป่ญฏๆ–‡ๅญ—ใ€ๅฐ‹ๆ‰พ่ณ‡ๆ–™ๆˆ–ๅ”ๅŠฉๆ‚จๅฎ‰ๆŽ’่กŒ็จ‹็ญ‰ใ€‚่ซ‹ๅ‘Š่จดๆˆ‘ๅฆ‚ไฝ•่ƒฝๅนซๅŠฉๆ‚จใ€‚<|im_end|>
<|im_start|>user
ๅคชๆฃ’ไบ†๏ผ<|im_end|>
<|im_start|>assistant

"""

# Generate a response using the custom stopping criteria
output = llm(flatten_chat_for_generation, return_full_text=False, max_new_tokens=128, top_p=0.9, temperature=0.7, stopping_criteria=[EosListStoppingCriteria([tokenizer.eos_token_id])])
print(output[0]['generated_text'])
"่ฌ่ฌ๏ผๅพˆ้ซ˜่ˆˆ่ƒฝๅค ็‚บๆ‚จๆœๅ‹™ใ€‚ๅฆ‚ๆžœๆœ‰ไปปไฝ•ๅ…ถไป–้œ€่ฆๅ”ๅŠฉ็š„ๅœฐๆ–น๏ผŒ่ซ‹้šจๆ™‚่ˆ‡ๆˆ‘่ฏ็นซใ€‚ๆˆ‘ๆœƒ็›กๆœ€ๅคงๅŠชๅŠ›็‚บๆ‚จๆไพ›ๆ‰€้œ€็š„ๆ”ฏๆดใ€‚"

vLLM

Start the server

export NUM_GPUS=4
export PORT=8000

docker run \
  -e HF_TOKEN=$HF_TOKEN \
  --gpus '"device=0,1,2,3"' \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p "${PORT}:8000" \
  --ipc=host \
  vllm/vllm-openai:v0.4.0.post1 \
  --model "yentinglin/Llama-3-Taiwan-70B-Instruct-rc1" \
  -tp "${NUM_GPUS}"

Sample client code, or you can use anything OpenAI-API compatible clients

# pip install "openai>=1.0.0"
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="yentinglin/Llama-3-Taiwan-70B-Instruct-rc1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."},
    ]
)
print("Chat response:", chat_response)

Enjoy exploring the capabilities of Llama-3-Taiwan-70B! We look forward to seeing what you create with this powerful open-source model. If you have any questions or feedback, please let us know.

Contributions

  • Professor Yun-Nung (Vivian) Chen, for her guidance and advisement throughout the project.
  • Wei-Lin Chen, for leading our pretraining data pipeline.
  • Tzu-Han Lin, for synthetic data generation.
  • Chang-Sheng Kao, for enhancing our synthetic data quality.
  • Kang-Chieh Chen, for cleaning instruction-following data.
  • Min-Yi Chen and Shao-Heng Hsu, for collecting chemical engineering data and benchmarks.
  • Chung-Yao Ma, Jonathan Guo and Kai-Chun Chang, for collecting manufacturing and electrical engineering data and benchmarks, and project progress management

Citation

@article{DBLP:journals/corr/abs-2311-17487,
  author       = {Yen{-}Ting Lin and
                  Yun{-}Nung Chen},
  title        = {Taiwan {LLM:} Bridging the Linguistic Divide with a Culturally Aligned
                  Language Model},
  journal      = {CoRR},
  volume       = {abs/2311.17487},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2311.17487},
  doi          = {10.48550/ARXIV.2311.17487},
  eprinttype    = {arXiv},
  eprint       = {2311.17487},
  timestamp    = {Tue, 05 Dec 2023 14:40:42 +0100},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2311-17487.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/abs-2403-20180,
  author       = {Po{-}Heng Chen and
                  Sijia Cheng and
                  Wei{-}Lin Chen and
                  Yen{-}Ting Lin and
                  Yun{-}Nung Chen},
  title        = {Measuring Taiwanese Mandarin Language Understanding},
  journal      = {CoRR},
  volume       = {abs/2403.20180},
  year         = {2024},
  url          = {https://doi.org/10.48550/arXiv.2403.20180},
  doi          = {10.48550/ARXIV.2403.20180},
  eprinttype    = {arXiv},
  eprint       = {2403.20180},
  timestamp    = {Wed, 10 Apr 2024 17:37:45 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2403-20180.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
Downloads last month
295
GGUF
Model size
70.6B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chienweichang/Llama-3-Taiwan-70B-Instruct-GGUF

Quantized
(8)
this model