70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float Paper • 2504.11651 • Published 10 days ago • 15
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference Paper • 2504.10326 • Published 11 days ago • 25
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published 18 days ago • 123
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 11 days ago • 239
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published about 1 month ago • 46
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 23 days ago • 82
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models Paper • 2504.04823 • Published 18 days ago • 30
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper • 2504.02436 • Published 22 days ago • 35
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published 17 days ago • 81
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 17 days ago • 104
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published Mar 20 • 48
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Paper • 2503.15558 • Published Mar 18 • 46
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper • 2503.16257 • Published Mar 20 • 24
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 71