Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
MLap 's Collections
Multimodality-Vision Focused

Multimodality-Vision Focused

updated Mar 19
Upvote
-

  • Emu3: Next-Token Prediction is All You Need

    Paper • 2409.18869 • Published Sep 27, 2024 • 95

  • Harnessing Webpage UIs for Text-Rich Visual Understanding

    Paper • 2410.13824 • Published Oct 17, 2024 • 32

  • PaliGemma: A versatile 3B VLM for transfer

    Paper • 2407.07726 • Published Jul 10, 2024 • 71

  • YaRN: Efficient Context Window Extension of Large Language Models

    Paper • 2309.00071 • Published Aug 31, 2023 • 71

  • MiniCPM-V: A GPT-4V Level MLLM on Your Phone

    Paper • 2408.01800 • Published Aug 3, 2024 • 83

  • Qwen2.5 Technical Report

    Paper • 2412.15115 • Published Dec 19, 2024 • 368

  • Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

    Paper • 2311.06242 • Published Nov 10, 2023 • 94

  • Optimized Table Tokenization for Table Structure Recognition

    Paper • 2305.03393 • Published May 5, 2023
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs