This is a fine-tuned version of unsloth/Phi-4 with enhanced reasoning capabilities using GRPO (1000 step) on the dataset gsm8k
Chat template
Files info
Base model