π’ Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!
π€ EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.
β¨ EMOVA Highlights β State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously. β Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)! β Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!
We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!
3 replies
Β·
reacted to tomaarsen's
post with ππ₯3 months ago
An assembly of 18 European companies, labs, and universities have banded together to launch πͺπΊ EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.
πͺπΊ 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi 3οΈβ£ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion β‘οΈ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common. βοΈ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported. π₯ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models π Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight. π Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.
The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
π Duality is super excited to announce that our Kaggle competition is LIVE! Synthetic-to-Real Object Detection Challenge is LIVE! π¦ Want to master AI training, learn industry-proven synthetic data workflows, and compete for public recognition and cash prizes?
Compete to build the top-performing model capable of detecting real-world objectsβtrained entirely on synthetic data. Master these industry-proven methods for faster, more targeted, and diverse dataset creation, and set yourself apart, unlocking today's most exciting AI opportunities.
Ready to test your skills?
π The Challenge
Train an object detection model using synthetic images created with FalconβDuality AI's cutting-edge digital twin simulation softwareβthen evaluate your model on real-world imagery.
Win Cash Prizes & Recognition πΉ Earn cash and public shout-outs from the Duality AI accounts Enhance Your Portfolio πΉ Demonstrate your real-world AI and ML expertise in object detection to prospective employers and collaborators. Expand Your Network πΉ Engage, compete, and collaborate with fellow ML engineers, researchers, and students.
Extremely bullish on @CohereForAI's Aya Vision (8B & 32B) - new SOTA open-weight VLMs
- 8B wins up to 81% of the time in its class, better than Gemini Flash - 32B beats Llama 3.2 90B! - Covers 23 languages, excels in image captioning, VQA & more - Integrated on transformers from Day 0!
π₯³π₯³Just achieved 25m 59s of research with plain ChatGPT π₯ Had it doing a complete internet search in just ONE call visiting 443 websites! Hard to beat huh! PROMPT IN COMMENTS Check out the Massive Article created by the prompt: https://huggingface.co/blog/luigi12345/automating-lead-generation-with-ai
π Exciting news, everyone! I've just released **Thespis-Llama-3.1-8B**, a new language model designed for enhanced roleplaying! β¨οΈ
It's built on Llama-3.1 and fine-tuned with a focus on Theory of Mind reasoning to create more believable and engaging characters. It even learned a few tricks on its own, like adding in-character thought processes! π§
Give it a try and let me know what you think! I'm especially interested in feedback on how well the characters stay in role and if the responses feel natural. Looking forward to seeing what amazing stories you create! βοΈ