Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published 6 days ago • 27
BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing Paper • 2206.15076 • Published Jun 30, 2022 • 4
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure Paper • 2504.10049 • Published 9 days ago • 3
Summarization of long multimodal documents Collection Ressources related to summarization of long multimodal documents such as scientific presentations. • 5 items • Updated 7 days ago
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure Paper • 2504.10049 • Published 9 days ago • 3
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure Paper • 2504.10049 • Published 9 days ago • 3 • 2
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 398
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 77
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • Updated 14 days ago • 622k • 1.32k
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality Mar 4 • 73