Lyte commited on
Commit
fcbdcbd
·
verified ·
1 Parent(s): a553fc1
Files changed (1) hide show
  1. README.md +76 -1
README.md CHANGED
@@ -10,8 +10,83 @@ tags:
10
  - llama
11
  - trl
12
  - sft
 
 
 
13
  ---
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  # Uploaded model
16
 
17
  - **Developed by:** Lyte
@@ -20,4 +95,4 @@ tags:
20
 
21
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
10
  - llama
11
  - trl
12
  - sft
13
+ datasets:
14
+ - Lyte/Reasoning-Paused
15
+ pipeline_tag: text-generation
16
  ---
17
 
18
+
19
+ # Model Information:
20
+
21
+ - This model was trained with a dataset that has the following columns: initial reasoning/assessment, step by step, verifications that come after each step, and final answer presentation based on full context, is it better than the original base model, i don't know, i am not sure i can run evals on it and i can't afford to run them manually.
22
+ - The model will basically (over)think for longer before answering you, it's best to use minimum 4k or up to 16k context to allow it to (over)think, it was trained with 32k context.
23
+ - Model's performance from manual testing seems to show the model does better at chatting(mental health, safety, creativity, etc...) from my personal tests so far, and honestly best i can tell you is test it yourself using this [Colab Notebook](https://colab.research.google.com/drive/1dcBbHAwYJuQJKqdPU570Hddv_F9wzjPO?usp=sharing)
24
+ - The dataset i have public is not the full dataset used, and the dataset originally was meant for something entirely different using a custom MoE architecture unfortunately i cannot afford to run the experiment.
25
+ - KingNish re-ignited the passion for me to re-pick up this because i had just given up on it after the first attempt a month or so ago that i shared, so cheers and enjoy the toy.
26
+
27
+ # Inference Code:
28
+
29
+ - Feel free to make the steps and verifications hidden and the initial reasoning and show only the final answer to get an o1 feel(i don't know)
30
+
31
+ ```python
32
+ from transformers import AutoModelForCausalLM, AutoTokenizer
33
+
34
+ model_name = "Lyte/Llama-3.2-3B-Overthinker"
35
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
36
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
37
+
38
+ def generate_response(prompt, max_tokens=16384, temperature=0.8, top_p=0.95, repeat_penalty=1.1, num_steps=3):
39
+ messages = [{"role": "user", "content": prompt}]
40
+
41
+ # Generate reasoning
42
+ reasoning_template = tokenizer.apply_chat_template(messages, tokenize=False, add_reasoning_prompt=True)
43
+ reasoning_inputs = tokenizer(reasoning_template, return_tensors="pt").to(model.device)
44
+
45
+ reasoning_ids = model.generate(
46
+ **reasoning_inputs,
47
+ max_new_tokens=max_tokens // 3,
48
+ temperature=temperature,
49
+ top_p=top_p,
50
+ repetition_penalty=repeat_penalty
51
+ )
52
+ reasoning_output = tokenizer.decode(reasoning_ids[0, reasoning_inputs.input_ids.shape[1]:], skip_special_tokens=True)
53
+
54
+ # Generate thinking (step-by-step and verifications)
55
+ messages.append({"role": "reasoning", "content": reasoning_output})
56
+ thinking_template = tokenizer.apply_chat_template(messages, tokenize=False, add_thinking_prompt=True, num_steps=num_steps)
57
+ thinking_inputs = tokenizer(thinking_template, return_tensors="pt").to(model.device)
58
+
59
+ thinking_ids = model.generate(
60
+ **thinking_inputs,
61
+ max_new_tokens=max_tokens // 3,
62
+ temperature=temperature,
63
+ top_p=top_p,
64
+ repetition_penalty=repeat_penalty
65
+ )
66
+ thinking_output = tokenizer.decode(thinking_ids[0, thinking_inputs.input_ids.shape[1]:], skip_special_tokens=True)
67
+
68
+ # Generate final answer
69
+ messages.append({"role": "thinking", "content": thinking_output})
70
+ answer_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
71
+ answer_inputs = tokenizer(answer_template, return_tensors="pt").to(model.device)
72
+
73
+ answer_ids = model.generate(
74
+ **answer_inputs,
75
+ max_new_tokens=max_tokens // 3,
76
+ temperature=temperature,
77
+ top_p=top_p,
78
+ repetition_penalty=repeat_penalty
79
+ )
80
+ answer_output = tokenizer.decode(answer_ids[0, answer_inputs.input_ids.shape[1]:], skip_special_tokens=True)
81
+ return reasoning_output, thinking_output, answer_output
82
+
83
+ # Example usage:
84
+ prompt = "Explain the process of photosynthesis."
85
+ response = generate_response(prompt, num_steps=5)
86
+
87
+ print("Response:", response)
88
+ ```
89
+
90
  # Uploaded model
91
 
92
  - **Developed by:** Lyte
 
95
 
96
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
97
 
98
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)