Llasa Collection TTS foundation model compatible with Llama framework (160k hours tokenized speech data released) • 11 items • Updated 4 days ago • 18
Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning Paper • 2407.15815 • Published Jul 22, 2024 • 14
Wavelets Are All You Need for Autoregressive Image Generation Paper • 2406.19997 • Published Jun 28, 2024 • 32
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3, 2024 • 96
Running on Zero 759 759 Florence 2 📉 Analyze images to generate captions, detect objects, or perform OCR
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24, 2024 • 61