RefinedNeuro commited on
Commit
a3b793d
·
verified ·
1 Parent(s): f0a1b9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +219 -3
README.md CHANGED
@@ -1,3 +1,219 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset
5
+ language:
6
+ - tr
7
+ - en
8
+ base_model:
9
+ - ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1
10
+ new_version: RefinedNeuro/RN_TR_R1
11
+ library_name: transformers
12
+ tags:
13
+ - unsloth
14
+ - llama
15
+ - trl
16
+ - reasoning
17
+ - billingual
18
+ - stem
19
+ ---
20
+
21
+
22
+ ## Model Overview
23
+
24
+ **RefinedNeuro/RN_TR_R2** is a Turkish‐language reasoning model fine-tuned from
25
+ `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1` using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions.
26
+
27
+ * **🌐 Available on:**: [https://ollama.com/RefinedNeuro/RN\_TR\_R2](https://ollama.com/RefinedNeuro/RN_TR_R2)
28
+
29
+ ---
30
+
31
+ ## Evaluation
32
+
33
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/680e40dae19765db2aeb9d80/hyOZSykSa609rDDzPpml2.png)
34
+
35
+ Benchmark: [RN\_TR\_R2\_Benchmark\_Results](https://huggingface.co/datasets/RefinedNeuro/RN_TR_R2_Benchmark_Results)
36
+ (Open-ended Turkish culture & reasoning questions)
37
+
38
+ | Model Name | Score (%) | Description |
39
+ | --------------------------------------- | --------- | ----------------------------------------------------------- |
40
+ | **RefinedNeuro/RN_TR_R2** | **82.4** | Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data |
41
+ | Qwen3-8B | 76.8 | Off-the-shelf Qwen3-8B |
42
+ | RefinedNeuro/RN\_TR\_R1 | 64.8 | Baseline Turkish-fine-tuned model |
43
+ | ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 63.2 | Turkish-adapted Llama-8B via direct preference optimization |
44
+ | Meta-Llama-3.1-8B-Instruct | 63.2 | Meta’s Llama-3.1-8B with instruct tuning |
45
+
46
+ RN_TR_R2 achieves a **17.6 pp** improvement over the RN_TR_R1 baseline.
47
+
48
+ ---
49
+
50
+ ## Intended Use
51
+
52
+ * **Primary use cases**: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture).
53
+ * **Not recommended**: Generating creative fiction or tasks requiring memorized facts outside its training scope.
54
+ * **Known limitations**: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs.
55
+
56
+ ---
57
+
58
+ ## Training Data
59
+
60
+ * **Base model**: `ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1`
61
+ * **Fine-tuning dataset**: [RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset](https://huggingface.co/datasets/RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset)
62
+
63
+ * **Domains**: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics
64
+ * **Grade levels**: 6th–12th grade
65
+ * **Prompt variations** (e.g., long/detailed vs. short/easy; answer must be a single number)
66
+ * **Size**: 13 000 examples (\~9 million tokens)
67
+ * **License**: Creative Commons
68
+
69
+ ---
70
+
71
+ ## Training Configuration
72
+
73
+ * **Method**: GRPO (Generalized Reinforcement-Preference Optimization) via `unsloth`
74
+ * **Hardware**: 2× RTX 3090 GPUs for 1 week (\~336 GPU-hours)
75
+ * **Batch size**: 1
76
+ * **Context length**: 4 096 tokens (max\_seq\_length=8 192 at inference)
77
+ * **Optimizer & scheduler**: Cosine LR schedule, peak LR=5×10⁻⁶
78
+ * **Library**: `unsloth==2025.3.18`
79
+
80
+ ---
81
+
82
+ ## Usage Example
83
+
84
+ ```bash
85
+ pip install unsloth==2025.3.18
86
+ ```
87
+
88
+ ```python
89
+ from unsloth import FastLanguageModel
90
+ from transformers import TextStreamer
91
+
92
+ # Load the model and tokenizer
93
+ model, tokenizer = FastLanguageModel.from_pretrained(
94
+ model_name="RefinedNeuro/RN_TR_R2",
95
+ max_seq_length=8192,
96
+ load_in_4bit=True,
97
+ )
98
+
99
+ FastLanguageModel.for_inference(model)
100
+
101
+ SYSTEM_PROMPT = """
102
+ Sana sorulan sorulara cevap verirken, adım adım düşün ve gerekçelendir.
103
+ Önce <think></think> tagları arasında adım adım düşün.
104
+ Sonra düşünme adımlarını özetle.
105
+ En son \\boxed{} içerisinde sorunun cevabını ver.
106
+ """
107
+
108
+ def answer_question(question: str):
109
+ messages = [
110
+ {'role': 'system', 'content': SYSTEM_PROMPT},
111
+ {'role': 'user', 'content': question},
112
+ ]
113
+ inputs = tokenizer.apply_chat_template(
114
+ messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
115
+ ).to("cuda")
116
+
117
+ streamer = TextStreamer(tokenizer)
118
+ print("-" * 60)
119
+ _ = model.generate(
120
+ input_ids=inputs,
121
+ streamer=streamer,
122
+ max_new_tokens=2048,
123
+ temperature=0.6,
124
+ top_p=0.1,
125
+ )
126
+
127
+ # Examples
128
+ answer_question("Türkiye'nin başkenti neredir?")
129
+ answer_question("6 ve 2'nin çarpımı nedir?")
130
+ ```
131
+
132
+ ---
133
+
134
+ ## Sample Q\&A
135
+
136
+ ```
137
+ > **Q:** f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz.
138
+ >
139
+ > **A:** <think>
140
+
141
+ 1. Öncelikle f(x) fonksiyonunu türevini bulmak için türev kuralını kullanacağız. Türev kuralı, fonksiyonun türevini bulmak için, fonksiyonun x'te türevini bulmak için gerektirir.
142
+
143
+ 2. f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunu yazalım:
144
+
145
+ f(x) = 3x⁴ - 2x³ + 5x - 7
146
+
147
+ 3. Fonksiyonun türevini bulmak için, fonksiyonu x'e göre türevini alacağız.
148
+
149
+ f'(x) = d(3x⁴ - 2x³ + 5x - 7)/dx
150
+
151
+ 4. Şimdi x'in türevini fonksiyonun içinden alalım.
152
+
153
+ f'(x) = d(3x⁴)/dx - d(2x³)/dx + d(5x)/dx - d(7)/dx
154
+
155
+ 5. Öncelikle x'in türevini bulalım:
156
+
157
+ d(3x⁴)/dx = 4 * 3x³ (çünkü 4 * x^(3-1) = 4 * 3x³)
158
+
159
+ d(2x³)/dx = 3 * 2x² (çünkü 3 * x^(3-1) = 3 * 2x²)
160
+
161
+ d(5x)/dx = 5 (çünkü x'in türevi 1'dir)
162
+
163
+ d(7)/dx = 0 (çünkü sabit bir değerdir)
164
+
165
+ 6. Şimdi fonksiyonun türevini bulmak için bu değerleri yerine koyalım:
166
+
167
+ f'(x) = 4 * 3x³ - 3 * 2x² + 5 - 0
168
+
169
+ f'(x) = 12x³ - 6x² + 5
170
+
171
+ 7. Şimdi f(x) fonksiyonunun türevini bulmak için, türevini x=1'deki değerini bulalım.
172
+
173
+ f'(1) = 12 * 1³ - 6 * 1² + 5
174
+
175
+ f'(1) = 12 - 6 + 5
176
+
177
+ f'(1) = 11
178
+
179
+ 8. Sonuç olarak f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevi 11'dir.
180
+
181
+ \boxed{11}
182
+
183
+ </think>
184
+
185
+ \boxed{11}
186
+ ```
187
+
188
+ ---
189
+
190
+ ## Citation
191
+
192
+ If you use this model, please cite:
193
+
194
+ ```bibtex
195
+ @misc{farsak2025rn_tr_r2,
196
+ title = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model},
197
+ author = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro},
198
+ year = {2025},
199
+ howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}},
200
+ license = {Apache-2.0}
201
+ }
202
+ ```
203
+
204
+ ---
205
+
206
+ ## License
207
+
208
+ Apache 2.0 — the most permissive open-source license.
209
+
210
+ ---
211
+
212
+ ## Authors
213
+
214
+ * **Authors**: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro
215
+
216
+
217
+
218
+
219
+