Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Apr 28 • 480
microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 Zero-Shot Image Classification • Updated Jan 14 • 81.9k • 313
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography Paper • 2409.18119 • Published Sep 26, 2024
Conditional Generation of Audio from Video via Foley Analogies Paper • 2304.08490 • Published Apr 17, 2023