prajdabre commited on
Commit
f2156c1
·
verified ·
1 Parent(s): e657883

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -11
README.md CHANGED
@@ -5,7 +5,7 @@ license: mit
5
  These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
6
 
7
  *NOTE*:
8
- These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://arxiv.org/abs/2408.11382).
9
 
10
  Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
11
 
@@ -17,7 +17,7 @@ The usage instructions are very similar to [IndicTrans2 HuggingFace models](http
17
  ```python
18
  import torch
19
  import warnings
20
- from IndicTransToolkit import IndicProcessor
21
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
22
 
23
  warnings.filterwarnings("ignore")
@@ -67,19 +67,27 @@ print(" | > Translations:", outputs[0])
67
  If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
68
 
69
  ```bibtex
70
- @misc{gumma2025inducinglongcontextabilitiesmultilingual,
71
- title={Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models},
72
- author={Varun Gumma and Pranjal A. Chitale and Kalika Bali},
73
- year={2025},
74
- eprint={2408.11382},
75
- archivePrefix={arXiv},
76
- primaryClass={cs.CL},
77
- url={https://arxiv.org/abs/2408.11382},
 
 
 
 
 
 
 
 
78
  }
79
  ```
80
 
81
  # Note
82
- These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
83
 
84
  # Warning
85
  Occasionally, you may notice some variation in the output, which may not be optimal. In such cases, you can experiment with adjusting the `num_beams`, `repetition_penalty`, and `length_penalty` parameters in the `generation_config`. Based on standard testing, the example with an input size of 1457 can be run on a single A100 GPU. However, the 1B model might require more compute resources or a lower beam size for generation.
 
5
  These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
6
 
7
  *NOTE*:
8
+ These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://aclanthology.org/2025.naacl-long.366/).
9
 
10
  Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
11
 
 
17
  ```python
18
  import torch
19
  import warnings
20
+ from IndicTransToolkit.processor import IndicProcessor
21
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
22
 
23
  warnings.filterwarnings("ignore")
 
67
  If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
68
 
69
  ```bibtex
70
+ @inproceedings{gumma-etal-2025-towards,
71
+ title = "Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models",
72
+ author = "Gumma, Varun and
73
+ Chitale, Pranjal A and
74
+ Bali, Kalika",
75
+ editor = "Chiruzzo, Luis and
76
+ Ritter, Alan and
77
+ Wang, Lu",
78
+ booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
79
+ month = apr,
80
+ year = "2025",
81
+ address = "Albuquerque, New Mexico",
82
+ publisher = "Association for Computational Linguistics",
83
+ url = "https://aclanthology.org/2025.naacl-long.366/",
84
+ pages = "7158--7170",
85
+ ISBN = "979-8-89176-189-6"
86
  }
87
  ```
88
 
89
  # Note
90
+ These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be slightly sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
91
 
92
  # Warning
93
  Occasionally, you may notice some variation in the output, which may not be optimal. In such cases, you can experiment with adjusting the `num_beams`, `repetition_penalty`, and `length_penalty` parameters in the `generation_config`. Based on standard testing, the example with an input size of 1457 can be run on a single A100 GPU. However, the 1B model might require more compute resources or a lower beam size for generation.