Establishing Task Scaling Laws via Compute-Efficient Model Ladders Paper • 2412.04403 • Published Dec 5, 2024 • 3
DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published 5 days ago • 14
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31, 2024 • 64
HINT: Hypernetwork Instruction Tuning for Efficient Zero-Shot Generalisation Paper • 2212.10315 • Published Dec 20, 2022 • 1
Continued Pretraining for Better Zero- and Few-Shot Promptability Paper • 2210.10258 • Published Oct 19, 2022
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets Paper • 2312.10253 • Published Dec 15, 2023 • 8
Paloma: A Benchmark for Evaluating Language Model Fit Paper • 2312.10523 • Published Dec 16, 2023 • 13