This is functionaly similar to vidore/colqwen2-v1.0, but only 87M and based on the upstream Qwen/Qwen2-VL-2B-Instruct
Qwen/Qwen2-VL-2B-Instruct