R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).
Chat template
Files info
Base model