Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published 19 days ago • 35
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models Paper • 2409.18943 • Published Sep 27, 2024 • 30