WORLDMEM: Long-term Consistent World Simulation with Memory Paper • 2504.12369 • Published 4 days ago • 28
DataDecide Collection A suite of models, data, and evals over 25 corpora, 14 sizes, and 3 seeds to measure how accurately small experiments predict rankings at large scale. • 358 items • Updated 4 days ago • 11
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 12 days ago • 101
TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published 21 days ago • 17
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 3 items • Updated 25 days ago • 89
Gemstone Models Collection Our 22 open source Gemstone models for scaling laws range from 50M to 2B parameters, spanning 11 widths from 256 to 3072 and 18 depths from 3 to 80. • 59 items • Updated Feb 26 • 8
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated 27 days ago • 59
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published Feb 20 • 48
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 21 days ago • 446
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 383