Lyte
/

Llama-3.2-3B-Overthinker

@@ -10,8 +10,83 @@ tags:
 - llama
 - trl
 - sft
 ---
 # Uploaded  model
 - **Developed by:** Lyte
@@ -20,4 +95,4 @@ tags:
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - llama
 - trl
 - sft
+datasets:
+- Lyte/Reasoning-Paused
+pipeline_tag: text-generation
 ---
+# Model Information:
+- This model was trained with a dataset that has the following columns: initial reasoning/assessment, step by step, verifications that come after each step, and final answer presentation based on full context, is it better than the original base model, i don't know, i am not sure i can run evals on it and i can't afford to run them manually.
+- The model will basically (over)think for longer before answering you, it's best to use minimum 4k or up to 16k context to allow it to (over)think, it was trained with 32k context.
+- Model's performance from manual testing seems to show the model does better at chatting(mental health, safety, creativity, etc...) from my personal tests so far, and honestly best i can tell you is test it yourself using this [Colab Notebook](https://colab.research.google.com/drive/1dcBbHAwYJuQJKqdPU570Hddv_F9wzjPO?usp=sharing)
+- The dataset i have public is not the full dataset used, and the dataset originally was meant for something entirely different using a custom MoE architecture unfortunately i cannot afford to run the experiment.
+- KingNish re-ignited the passion for me to re-pick up this because i had just given up on it after the first attempt a month or so ago that i shared, so cheers and enjoy the toy.
+# Inference Code:
+- Feel free to make the steps and verifications hidden and the initial reasoning and show only the final answer to get an o1 feel(i don't know)
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "Lyte/Llama-3.2-3B-Overthinker"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
+def generate_response(prompt, max_tokens=16384, temperature=0.8, top_p=0.95, repeat_penalty=1.1, num_steps=3):
+    messages = [{"role": "user", "content": prompt}]
+    # Generate reasoning
+    reasoning_template = tokenizer.apply_chat_template(messages, tokenize=False, add_reasoning_prompt=True)
+    reasoning_inputs = tokenizer(reasoning_template, return_tensors="pt").to(model.device)
+    reasoning_ids = model.generate(
+        **reasoning_inputs,
+        max_new_tokens=max_tokens // 3,
+        temperature=temperature,
+        top_p=top_p,
+        repetition_penalty=repeat_penalty
+    )
+    reasoning_output = tokenizer.decode(reasoning_ids[0, reasoning_inputs.input_ids.shape[1]:], skip_special_tokens=True)
+    # Generate thinking (step-by-step and verifications)
+    messages.append({"role": "reasoning", "content": reasoning_output})
+    thinking_template = tokenizer.apply_chat_template(messages, tokenize=False, add_thinking_prompt=True, num_steps=num_steps)
+    thinking_inputs = tokenizer(thinking_template, return_tensors="pt").to(model.device)
+    thinking_ids = model.generate(
+        **thinking_inputs,
+        max_new_tokens=max_tokens // 3,
+        temperature=temperature,
+        top_p=top_p,
+        repetition_penalty=repeat_penalty
+    )
+    thinking_output = tokenizer.decode(thinking_ids[0, thinking_inputs.input_ids.shape[1]:], skip_special_tokens=True)
+    # Generate final answer
+    messages.append({"role": "thinking", "content": thinking_output})
+    answer_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+    answer_inputs = tokenizer(answer_template, return_tensors="pt").to(model.device)
+    answer_ids = model.generate(
+        **answer_inputs,
+        max_new_tokens=max_tokens // 3,
+        temperature=temperature,
+        top_p=top_p,
+        repetition_penalty=repeat_penalty
+    )
+    answer_output = tokenizer.decode(answer_ids[0, answer_inputs.input_ids.shape[1]:], skip_special_tokens=True)
+    return reasoning_output, thinking_output, answer_output
+# Example usage:
+prompt = "Explain the process of photosynthesis."
+response = generate_response(prompt, num_steps=5)
+print("Response:", response)
+```
 # Uploaded  model
 - **Developed by:** Lyte
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)