File size: 3,187 Bytes

d1a86e3
 
 
 
 
 
 
 
 
 
 
 
 
d4ecbd0
0207b42
f4c0dbb
 
8636c9b
76ea2a9
d4ecbd0
 
 
 
 
 
 
 
 
 
 
 
 
1ca276c
499fe4e
 
c369292
d4ecbd0
 
 
 
 
 
 
 
 
76ea2a9
d4ecbd0
 
 
0207b42
 
 
 
 
 
d4ecbd0
0207b42
d4ecbd0
0207b42
 
 
 
d4ecbd0
 
 
 
1ca276c
d4ecbd0
 
 
 
 
 
 
 
 
 
1ca276c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d4ecbd0
 
 
 
 
 
 
 
c369292
0207b42
6b50c8b
5649c88
 
c369292
d1a86e3

---
base_model: llm-jp/llm-jp-3-13b
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
---

# How to Run this Model
基本的にhugging face modelとしてloadすればOK。

**elyza-tasks-100-TV_0.jsonl を事前に同じフォルダーに置いてください。**

**HF_TOKENの入れ替えを忘れないでください**


環境準備

```
!pip install -U bitsandbytes
!pip install -U transformers
!pip install -U accelerate
!pip install -U datasets
```



結果jsonlを作成ためのコード例

推論結果が　llm-jp-3-13b-it-outputs.jsonl　に作成される
```
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
import torch
from tqdm import tqdm
import json

HF_TOKEN = ADD YOUR OWN TOKEN
model_name = "AlHfac/llm-jp-3-13b-it"

# QLoRA config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False,
)
# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    token = HF_TOKEN
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, token = HF_TOKEN)

# Load Questions
datasets = []
with open("./elyza-tasks-100-TV_0.jsonl", "r") as f:
    item = ""
    for line in f:
      line = line.strip()
      item += line
      if item.endswith("}"):
        datasets.append(json.loads(item))
        item = ""

# Generate results using loaded model
results = []
for data in tqdm(datasets):

  input = data["input"]

  prompt = f"""### 指示
  {input}
  ### 回答：
  """

  tokenized_input = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
  with torch.no_grad():
      outputs = model.generate(
          tokenized_input,
          max_new_tokens=100,
          do_sample=False,
          repetition_penalty=1.2
      )[0]
  output = tokenizer.decode(outputs[tokenized_input.size(1):], skip_special_tokens=True)

  results.append({"task_id": data["task_id"], "input": input, "output": output})


# Generate jsonl
import re
model_name = re.sub(".*/", "", model_name)
with open(f"./{model_name}-outputs.jsonl", 'w', encoding='utf-8') as f:
    for result in results:
        json.dump(result, f, ensure_ascii=False)  # ensure_ascii=False for handling non-ASCII characters
        f.write('\n')

```

`The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:2 for open-end generation.`
ようなlogを無視してもOK

# Model Training Information

- **Developed by:** AlHfac
- **License:** apache-2.0
- **Finetuned from model :** llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)