Yesterday we published the first large evaluation of the new model, showing that it absolutely leaves the competition in the dust. We have now made the results and data available here! Please check it out and β€οΈ !
GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?
The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).