Update README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,7 @@ language:
|
|
23 |
|
24 |
## 🎮 Overview
|
25 |
|
26 |
-
QuadConnect2.5-0.5B is a specialized language model trained to master the game of Connect Four. Built on Qwen 2.5 (0.5B parameter base), this model uses GRPO (
|
27 |
|
28 |
**Status**: Early training experiments (v0.0.9b) - Reward functions still evolving
|
29 |
|
|
|
23 |
|
24 |
## 🎮 Overview
|
25 |
|
26 |
+
QuadConnect2.5-0.5B is a specialized language model trained to master the game of Connect Four. Built on Qwen 2.5 (0.5B parameter base), this model uses GRPO (Group Relative Policy Optimization) to learn the strategic intricacies of Connect Four gameplay.
|
27 |
|
28 |
**Status**: Early training experiments (v0.0.9b) - Reward functions still evolving
|
29 |
|