RefinedNeuro
/

RN_TR_R2

+---
+license: apache-2.0
+datasets:
+- RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset
+language:
+- tr
+- en
+base_model:
+- ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
+new_version: RefinedNeuro/RN_TR_R1
+library_name: transformers
+tags:
+- unsloth
+- llama
+- trl
+- reasoning
+- billingual
+- stem
+---
+## Model Overview
+**RefinedNeuro/RN_TR_R2** is a Turkish‐language reasoning model fine-tuned from
+`ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions.
+* **🌐 Available on:**: [https://ollama.com/RefinedNeuro/RN\_TR\_R2](https://ollama.com/RefinedNeuro/RN_TR_R2)
+---
+## Evaluation
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/680e40dae19765db2aeb9d80/hyOZSykSa609rDDzPpml2.png)
+Benchmark: [RN\_TR\_R2\_Benchmark\_Results](https://huggingface.co/datasets/RefinedNeuro/RN_TR_R2_Benchmark_Results)
+(Open-ended Turkish culture & reasoning questions)
+| Model Name                              | Score (%) | Description                                                 |
+| --------------------------------------- | --------- | ----------------------------------------------------------- |
+| **RefinedNeuro/RN_TR_R2**             | **82.4**  | Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data   |
+| Qwen3-8B                                | 76.8      | Off-the-shelf Qwen3-8B                                      |
+| RefinedNeuro/RN\_TR\_R1                 | 64.8      | Baseline Turkish-fine-tuned model                           |
+| ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 63.2      | Turkish-adapted Llama-8B via direct preference optimization |
+| Meta-Llama-3.1-8B-Instruct              | 63.2      | Meta’s Llama-3.1-8B with instruct tuning                    |
+RN_TR_R2 achieves a **17.6 pp** improvement over the RN_TR_R1 baseline.
+---
+## Intended Use
+* **Primary use cases**: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture).
+* **Not recommended**: Generating creative fiction or tasks requiring memorized facts outside its training scope.
+* **Known limitations**: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs.
+---
+## Training Data
+* **Base model**: `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1`
+* **Fine-tuning dataset**: [RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset](https://huggingface.co/datasets/RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset)
+  * **Domains**: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics
+  * **Grade levels**: 6th–12th grade
+  * **Prompt variations** (e.g., long/detailed vs. short/easy; answer must be a single number)
+  * **Size**: 13 000 examples (\~9 million tokens)
+  * **License**: Creative Commons
+---
+## Training Configuration
+* **Method**: GRPO (Generalized Reinforcement-Preference Optimization) via `unsloth`
+* **Hardware**: 2× RTX 3090 GPUs for 1 week (\~336 GPU-hours)
+* **Batch size**: 1
+* **Context length**: 4 096 tokens (max\_seq\_length=8 192 at inference)
+* **Optimizer & scheduler**: Cosine LR schedule, peak LR=5×10⁻⁶
+* **Library**: `unsloth==2025.3.18`
+---
+## Usage Example
+```bash
+pip install unsloth==2025.3.18
+```
+```python
+from unsloth import FastLanguageModel
+from transformers import TextStreamer
+# Load the model and tokenizer
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name="RefinedNeuro/RN_TR_R2",
+    max_seq_length=8192,
+    load_in_4bit=True,
+)
+FastLanguageModel.for_inference(model)
+SYSTEM_PROMPT = """
+Sana sorulan sorulara cevap verirken, adım adım düşün ve gerekçelendir.
+Önce <think></think> tagları arasında adım adım düşün.
+Sonra düşünme adımlarını özetle.
+En son \\boxed{} içerisinde sorunun cevabını ver.
+"""
+def answer_question(question: str):
+    messages = [
+        {'role': 'system', 'content': SYSTEM_PROMPT},
+        {'role': 'user',   'content': question},
+    ]
+    inputs = tokenizer.apply_chat_template(
+        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
+    ).to("cuda")
+    streamer = TextStreamer(tokenizer)
+    print("-" * 60)
+    _ = model.generate(
+        input_ids=inputs,
+        streamer=streamer,
+        max_new_tokens=2048,
+        temperature=0.6,
+        top_p=0.1,
+    )
+# Examples
+answer_question("Türkiye'nin başkenti neredir?")
+answer_question("6 ve 2'nin çarpımı nedir?")
+```
+---
+## Sample Q\&A
+```
+> **Q:** f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz.
+>
+> **A:** <think>
+1. Öncelikle f(x) fonksiyonunu türevini bulmak için türev kuralını kullanacağız. Türev kuralı, fonksiyonun türevini bulmak için, fonksiyonun x'te türevini bulmak için gerektirir.
+2. f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunu yazalım:
+f(x) = 3x⁴ - 2x³ + 5x - 7
+3. Fonksiyonun türevini bulmak için, fonksiyonu x'e göre türevini alacağız.
+f'(x) = d(3x⁴ - 2x³ + 5x - 7)/dx
+4. Şimdi x'in türevini fonksiyonun içinden alalım.
+f'(x) = d(3x⁴)/dx - d(2x³)/dx + d(5x)/dx - d(7)/dx
+5. Öncelikle x'in türevini bulalım:
+d(3x⁴)/dx = 4 * 3x³ (çünkü 4 * x^(3-1) = 4 * 3x³)
+d(2x³)/dx = 3 * 2x² (çünkü 3 * x^(3-1) = 3 * 2x²)
+d(5x)/dx = 5 (çünkü x'in türevi 1'dir)
+d(7)/dx = 0 (çünkü sabit bir değerdir)
+6. Şimdi fonksiyonun türevini bulmak için bu değerleri yerine koyalım:
+f'(x) = 4 * 3x³ - 3 * 2x² + 5 - 0
+f'(x) = 12x³ - 6x² + 5
+7. Şimdi f(x) fonksiyonunun türevini bulmak için, türevini x=1'deki değerini bulalım.
+f'(1) = 12 * 1³ - 6 * 1² + 5
+f'(1) = 12 - 6 + 5
+f'(1) = 11
+8. Sonuç olarak f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevi 11'dir.
+\boxed{11}
+</think>
+\boxed{11}
+```
+---
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{farsak2025rn_tr_r2,
+  title        = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model},
+  author       = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro},
+  year         = {2025},
+  howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}},
+  license      = {Apache-2.0}
+}
+```
+---
+## License
+Apache 2.0 — the most permissive open-source license.
+---
+## Authors
+* **Authors**: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro