---
base_model: llm-jp/llm-jp-3-13b
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
---

# Uploaded  model

- **Developed by:** erikomaru
- **License:** apache-2.0
- **Finetuned from model :** llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

# Sample use

This script uses a qLoRA adapter trained with the Unsloth library to generate outputs for the ELYZA-tasks-100-TV benchmark tasks. 
It assumes that the adapter has been uploaded to Hugging Face. The code is designed specifically for loading the model using the Unsloth library and performing inference.
The model generates accurate and efficient outputs by leveraging advanced quantization techniques (4-bit NF4) and fine-tuned weights. It is ideal for Japanese natural language processing tasks that require precise instruction-following capabilities.


```python 

# Import necessary libraries
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
import torch
from tqdm import tqdm
import json

# Set your Hugging Face token and model name
HF_TOKEN = "your-token" # Replace with your token
model_name = "erikomaru/llm-jp-3-13b-it"

# Step 1: Configure 4-bit quantization settings
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Step 2: Load the model with LoRA adapters
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    token = HF_TOKEN
)

# Step 3: Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token = HF_TOKEN)

# Integrate the LoRA adapter into the base model.  
model = PeftModel.from_pretrained(model, adapter_id, token=HF_TOKEN)

# Step 4: Load dataset
datasets = []
with open("./elyza-tasks-100-TV_0.jsonl", "r") as f:
    item = ""
    for line in f:
      line = line.strip()
      item += line
      if item.endswith("}"):
        datasets.append(json.loads(item))
        item = ""

# Perform inference using the model.

# Switch the model to inference mode
FastLanguageModel.for_inference(model)

# Step 5: Run inference on dataset
        results = []
for data in tqdm(datasets):

  input = data["input"]

    # Construct the prompt
  prompt = f"""### 指示
  {input}
  ### 回答
  """
    # Tokenize input
  tokenized_input = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
  attention_mask = torch.ones_like(tokenized_input)

    # Generate output
  with torch.no_grad():
      outputs = model.generate(
          tokenized_input,
          attention_mask=attention_mask,
          max_new_tokens=100,
          do_sample=False,
          repetition_penalty=1.2,
          pad_token_id=tokenizer.eos_token_id
      )[0]
  output = tokenizer.decode(outputs[tokenized_input.size(1):], skip_special_tokens=True)

  results.append({"task_id": data["task_id"], "input": input, "output": output})

# Step 6: Save results to a JSONL file
import re
jsonl_id = re.sub(".*/", "", adapter_id)
with open(f"./{jsonl_id}-outputs.jsonl", 'w', encoding='utf-8') as f:
    for result in results:
        json.dump(result, f, ensure_ascii=False)  # ensure_ascii=False for handling non-ASCII characters
        f.write('\n')


```