CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published 4 days ago • 85
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Paper • 2504.10462 • Published 7 days ago • 14
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Paper • 2504.10465 • Published 7 days ago • 27
Heimdall: test-time scaling on the generative verification Paper • 2504.10337 • Published 7 days ago • 31
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published 14 days ago • 117
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published 10 days ago • 38
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 10 days ago • 119
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 11 days ago • 26
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 14 days ago • 164
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models Paper • 2504.04823 • Published 15 days ago • 29
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published 18 days ago • 52
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper • 2504.00824 • Published 20 days ago • 39
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? Paper • 2504.00509 • Published 21 days ago • 21
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published 25 days ago • 43