sbintuitions
/

modernbert-ja-310m

Model card Files Files and versions Community

hpprc commited on Feb 20

Commit

72d1b14

·

verified ·

1 Parent(s): 8d9e4cc

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ This repository provides Japanese ModernBERT trained by [SB Intuitions](https://
 [ModernBERT](https://arxiv.org/abs/2412.13663) is a new variant of the BERT model that combines local and global attention, allowing it to handle long sequences while maintaining high computational efficiency.
 It also incorporates modern architectural improvements, such as [RoPE](https://arxiv.org/abs/2104.09864).
-Our ModernBERT-Ja-310M is trained on a high-quality corpus of Japanese and English text comprising **4.1T tokens**, featuring a vocabulary size of 102,400 and a sequence length of **8,192** tokens.
 ## How to Use
@@ -81,7 +81,7 @@ Next, we conducted two phases of context length extension.
 1. **Pre-training**
   - Training with **3.51T tokens**, including Japanese and English data extracted from web corpora.
-  - The sequence length is 1,024 with naive sequence packing.
   - Masking rate is **30%** (with 80-10-10 rule).
 2. **Context Extension (CE): Phase 1**
   - Training with **430B tokens**, comprising high-quality Japanese and English data.

 [ModernBERT](https://arxiv.org/abs/2412.13663) is a new variant of the BERT model that combines local and global attention, allowing it to handle long sequences while maintaining high computational efficiency.
 It also incorporates modern architectural improvements, such as [RoPE](https://arxiv.org/abs/2104.09864).
+Our ModernBERT-Ja-310M is trained on a high-quality corpus of Japanese and English text comprising **4.09T tokens**, featuring a vocabulary size of 102,400 and a sequence length of **8,192** tokens.
 ## How to Use
 1. **Pre-training**
   - Training with **3.51T tokens**, including Japanese and English data extracted from web corpora.
+  - The sequence length is 1,024 with [best-fit packing](https://arxiv.org/abs/2404.10830).
   - Masking rate is **30%** (with 80-10-10 rule).
 2. **Context Extension (CE): Phase 1**
   - Training with **430B tokens**, comprising high-quality Japanese and English data.