Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Apr 28 • 484
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Paper • 2504.10465 • Published Apr 14 • 28
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27, 2024 • 55