view article Article Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC 19 days ago • 21
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 400
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality Mar 4 • 74
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 15 items • Updated 9 days ago • 228
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21, 2024 • 116
ChatAnything: Facetime Chat with LLM-Enhanced Personas Paper • 2311.06772 • Published Nov 12, 2023 • 35
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper • 2311.07069 • Published Nov 13, 2023 • 45
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper • 2311.06783 • Published Nov 12, 2023 • 28
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper • 2311.04145 • Published Nov 7, 2023 • 35