Text Generation
Transformers
Safetensors
English
Japanese
llama
conversational
text-generation-inference
Taishi-N324 commited on
Commit
2ffaf2b
·
verified ·
1 Parent(s): 161b3d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -187,9 +187,9 @@ The following instruction datasets were used for the instruction tuning.
187
  The first-turn user instructions were translated into Japanese via DeepL machine translation, and the assistant responses were generated using the Llama 3.1 405B Instruct model. Rejection sampling (n=6) was applied, with Llama 3.1 70B Instruct serving as a judge.
188
  - As implied by the dataset name, conversations that contain personally identifiable information (PII) or template-based user instructions have been removed. Duplicate instuctions have also been removed.
189
  - `filtered-magpie-ultra-ja`
190
- - A Japanese variant of the `filtered-magpie-ultra-en` dataset, machine-translated into Japanese using the Gemma 2 27B IT.
191
  - `gemma-magpie`
192
- - Japanese Q&A dataset on diverse topics, generated using prompts with specific category words, with answers by Gemma 2 27B IT, heuristically filtered for quality and length.
193
  - English
194
  - `lmsys-chat-1m-synth-en-wo-pii-and-template-instructions`
195
  - Similar to the `lmsys-chat-1m-synth-ja-wo-pii-and-template-instructions`, but this version uses the original English user instructions. The assistant responses were generated in English as well. Rejection sampling was not applied in this version.
 
187
  The first-turn user instructions were translated into Japanese via DeepL machine translation, and the assistant responses were generated using the Llama 3.1 405B Instruct model. Rejection sampling (n=6) was applied, with Llama 3.1 70B Instruct serving as a judge.
188
  - As implied by the dataset name, conversations that contain personally identifiable information (PII) or template-based user instructions have been removed. Duplicate instuctions have also been removed.
189
  - `filtered-magpie-ultra-ja`
190
+ - A Japanese variant of the `filtered-magpie-ultra-en` dataset, machine-translated into Japanese using the [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it).
191
  - `gemma-magpie`
192
+ - A Japanese synthetic Q&A dataset from scratch, generated using [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it). User instructions were created with prompts specific to each topic, and the assistant responses were generated for these instructions. The conversations were then heuristically filtered for quality and length.
193
  - English
194
  - `lmsys-chat-1m-synth-en-wo-pii-and-template-instructions`
195
  - Similar to the `lmsys-chat-1m-synth-ja-wo-pii-and-template-instructions`, but this version uses the original English user instructions. The assistant responses were generated in English as well. Rejection sampling was not applied in this version.