arxiv:2505.06496

xGen-small Technical Report

Published on May 10

Upvote

Authors:

Erik Nijkamp ,

Yingbo Zhou ,

Caiming Xiong

Abstract

xGen-small, a 4B and 9B Transformer decoder, excels in long-context applications through a comprehensive pipeline including data curation, multi-stage pre-training, and targeted post-training, achieving notable performance in math and coding.

AI-generated summary

We introduce xGen-small, a family of 4B and 9B Transformer decoder models optimized for long-context applications. Our vertically integrated pipeline unites domain-balanced, frequency-aware data curation; multi-stage pre-training with quality annealing and length extension to 128k tokens; and targeted post-training via supervised fine-tuning, preference learning, and online reinforcement learning. xGen-small delivers strong performance across various tasks, especially in math and coding domains, while excelling at long context benchmarks.