usui2024
/

llm-jp-3-13b-dpo_w_u_original

@@ -1,199 +1,240 @@
 ---
-library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+base_model: llm-jp/llm-jp-3-13b
+tags:
+- text-generation-inference
+- transformers
+- unsloth
+- llama
+- trl
+license: apache-2.0
+language:
+- en
 ---
 # Model Card for Model ID
+usui2024/llm-jp-3-13b-dpo_w_100usud
 <!-- Provide a quick summary of what the model is/does. -->
+# LLM-JP モデルのFine-Tuning と DPOの学習
+## 概要
+このプロジェクトでは、LLM-JPモデルをFine-Tuningし、DPO（Direct Preference Optimization）学習を用いてタスク生成と応答生成を行う方法を説明します。データセットは、ELYZAタスクを参考にしたものを用いて、新たなタスク生成を行います。このプロジェクトは、主に自然言語処理タスクにおけるトレーニングデータの自動生成に役立ちます。
+## 利用方法
+1. モデルをロード後、タスク生成を開始します。
+    - `datasets` モジュールを用いて、`ELYZA-tasks-100`からタスクを取得します。
+    - モデルを使ってタスクに基づいた新しいタスク生成を行います。
+    - 出力されたタスクを用いて、さらにモデルによる応答生成を行います。
+2. DPO（Direct Preference Optimization）学習:
+    - 生成されたタスクと応答を使い、DPO学習を行います。
+    - トレーニングのための設定は、`DPOTrainer`を用いて行います。
+    ```python
+    from trl import DPOTrainer
+    trainer = DPOTrainer(model, args, train_dataset=dpo_datasets)
+    trainer.train()
+    ```
+3. トレーニングが完了したモデルを使用して、タスクに基づく推論を行います。
+## モデルの詳細
+- **Developed by:**　usui2040
+- モデル名: **LLM-JP**
+- ベースモデル: **llm-jp-3-13b**
+- ファインチューニング方法: **LoRA**（Low-Rank Adaptation）
+- 学習タスク: **タスク生成、応答生成**
+- 利用されるアルゴリズム: **DPO (Direct Preference Optimization)**
+- トークナイザー: **AutoTokenizer**（Hugging Face提供）
+- 出力形式: **テキスト生成**
+-
+This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+モデルは、日本語を対象とした大規模なトランスフォーマーモデルです。タスクに基づいたテキスト生成や応答生成が可能で、生成タスクを指定することで柔軟に動作します。
+## モデルの設定とトレーニング
+Hugging Face上でモデルをトレーニングし、評価・出力を行う方法を説明します。
+### 必要なライブラリのインストール
+```python
+pip install -U ipywidgets
+pip install transformers==4.46.3
+pip install -U bitsandbytes
+pip install -U accelerate
+pip install -U datasets
+pip install -U peft==0.13.2
+pip install -U trl==0.12.1
+```
+### モデルの設定とトレーニング
+使用するベースモデルを指定して、それを読み込みます。ここでは、llm-jp-3-13bというモデルを使用しています。
+```python
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    BitsAndBytesConfig
+)
+from peft import PeftModel
+# モデル設定
+base_model_id = "llm-jp/llm-jp-3-13b"  # モデルのIDまたはパスを指定
+adapter_id = "usui2024/20241211_w_llm-jp-3-13b-it_lora"  # LoRAアダプターID
+# QLoRAの設定
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+)
+# モデルとトークナイザーのロード
+model = AutoModelForCausalLM.from_pretrained(
+    base_model_id,
+    quantization_config=bnb_config,
+    device_map="auto",
+    token=HF_TOKEN  # HF_TOKENはHugging Faceのアクセストークン
+)
+tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True, token=HF_TOKEN)
+# LoRAアダプターを統合
+model = PeftModel.from_pretrained(model, adapter_id, token=HF_TOKEN)
+```
+### 合成データの生成
+次に、モデルを使って合成データを生成します。以下では、ELYZA-tasks-100のデータを使用して新しいタスクを生成し、それに対するモデルの回答を作成します。
+```python
+from datasets import load_dataset
+from tqdm import tqdm
+# データセットの読み込み
+datasets = load_dataset("elyza/ELYZA-tasks-100")
+task_results = []
+# タスクの生成
+for ref_input in tqdm(datasets['test']['input']):
+    prompt = f"""以下に示す参考タスクに従って、類似したタスクを生成しなさい。
+    ## 参考タスク
+    仕事の熱意を取り戻すためのアイデアを5つ挙げてください。
+    ## 類似タスク
+    試合に挑む心構えを3つほど挙げてください。
+    ## 参考タスク
+    {ref_input}
+    ## 類似タスク
+    """
+    tokenized_input = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
+    attention_mask = torch.ones_like(tokenized_input)
+    with torch.no_grad():
+        outputs = model.generate(
+            tokenized_input,
+            attention_mask=attention_mask,
+            max_new_tokens=100,
+            num_return_sequences=3,  # 同じタスクから3つの新タスクを生成
+            do_sample=True,
+            temperature=0.6,
+            top_p=0.9,
+            repetition_penalty=1.2,
+            pad_token_id=tokenizer.eos_token_id
+        )
+    output_texts = [tokenizer.decode(output[tokenized_input.size(1):], skip_special_tokens=True) for output in outputs]
+    new_task = {"reference_task": ref_input}
+    new_task.update({f"similar_task_{i}": output_text for i, output_text in enumerate(output_texts)})
+    task_results.append(new_task)
+df = pd.DataFrame(task_results)
+df.head()
+```
+### DPO (Differentiable Prompt Optimization) の学習
+DPOを使って、生成したタスクと回答のペアに基づいてモデルをトレーニングします。
+```python
+from trl import DPOConfig, DPOTrainer
+from datasets import Dataset
+import torch
+# DPO用のデータセットを準備
+dpo_datasets = Dataset.from_list(dpo_datasets)
+# DPOの設定
+training_args = DPOConfig(
+    output_dir=new_model_id,
+    per_device_train_batch_size=1,
+    gradient_accumulation_steps=4,
+    num_train_epochs=2,
+    logging_steps=10,
+    save_steps=100,
+    save_total_limit=1,
+    learning_rate=1.5e-4,
+    fp16=True,
+)
+# DPOトレーナーの設定
+dpo_trainer = DPOTrainer(
+    model,
+    args=training_args,
+    train_dataset=dpo_datasets,
+    tokenizer=tokenizer,
+    peft_config=peft_config,
+)
+# 学習の実行
+dpo_trainer.train()
+```
+### モデルの推論
+最後に、モデルにタスクを入力し、その結果を得るための推論コードを提供します。
+```python
+# データセットの読み込み。
+datasets = []
+with open("./elyza-tasks-100-TV_0.jsonl", "r") as f:
+    item = ""
+    for line in f:
+      line = line.strip()
+      item += line
+      if item.endswith("}"):
+        datasets.append(json.loads(item))
+        item = ""
+task_results = []
+outputs_results = []
+for task in tqdm(dataset):
+    prompt = f"### 指示:\n{task}\n### 回答:\n"
+    tokenized_input = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
+    attention_mask = torch.ones_like(tokenized_input)
+    with torch.no_grad():
+        outputs = model.generate(
+            tokenized_input,
+            attention_mask=attention_mask,
+            max_new_tokens=512,
+            num_return_sequences=3,  # 最低でも2個の出力を生成
+            do_sample=True,
+            temperature=0.6,
+            top_p=0.9,
+            repetition_penalty=1.2,
+            pad_token_id=tokenizer.eos_token_id
+        )
+    output_texts = [tokenizer.decode(output[tokenized_input.size(1):], skip_special_tokens=True) for output in outputs]
+    outputs_results.append(output_texts)
+# 結果の表示
+for result in outputs_results:
+    print(result)
+```