giulio98
's Collections
KV CACHE Compression
updated
SnapKV: LLM Knows What You are Looking for Before Generation
Paper
•
2404.14469
•
Published
•
27
Finch: Prompt-guided Key-Value Cache Compression
Paper
•
2408.00167
•
Published
•
18
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge
Reasoning
Paper
•
2503.04973
•
Published
•
24
A Simple and Effective L_2 Norm-Based Strategy for KV Cache
Compression
Paper
•
2406.11430
•
Published
•
24
FastKV: KV Cache Compression for Fast Long-Context Processing with
Token-Selective Propagation
Paper
•
2502.01068
•
Published
•
16
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient
Long-Context LLM Inference
Paper
•
2502.00299
•
Published
•
2
Efficient Streaming Language Models with Attention Sinks
Paper
•
2309.17453
•
Published
•
13
Transformers are Multi-State RNNs
Paper
•
2401.06104
•
Published
•
38
H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
Language Models
Paper
•
2306.14048
•
Published
•
12
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression
Paper
•
2503.02812
•
Published
•
10
ThinK: Thinner Key Cache by Query-Driven Pruning
Paper
•
2407.21018
•
Published
•
33
LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with
Effortless Adaptation
Paper
•
2410.13846
•
Published
•
1
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and
Streaming Heads
Paper
•
2410.10819
•
Published
•
7
Scissorhands: Exploiting the Persistence of Importance Hypothesis for
LLM KV Cache Compression at Test Time
Paper
•
2305.17118
•
Published
•
1
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information
Funneling
Paper
•
2406.02069
•
Published
•
1