Multimodality-Vision Focused - a MLap Collection

MLap 's Collections

Multimodality-Vision Focused

Multimodality-Vision Focused

updated Mar 19

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27, 2024 • 95
Harnessing Webpage UIs for Text-Rich Visual Understanding

Paper • 2410.13824 • Published Oct 17, 2024 • 32
PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 71
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 71
MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 83
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 368
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 94
Optimized Table Tokenization for Table Structure Recognition

Paper • 2305.03393 • Published May 5, 2023