Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression Paper • 2503.02812 • Published Mar 4 • 9
Headless Language Models: Learning without Predicting with Contrastive Weight Tying Paper • 2309.08351 • Published Sep 15, 2023 • 3