prajdabre
/

rotary-indictrans2-indic-en-1B

PyTorch

RotaryIndicTrans

custom_code

Model card Files Files and versions Community

prajdabre commited on 18 days ago

Commit

f2156c1

verified ·

1 Parent(s): e657883

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -11

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ license: mit
 These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
 *NOTE*:
-These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://arxiv.org/abs/2408.11382).
 Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
@@ -17,7 +17,7 @@ The usage instructions are very similar to [IndicTrans2 HuggingFace models](http
 ```python
 import torch
 import warnings
-from IndicTransToolkit import IndicProcessor
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 warnings.filterwarnings("ignore")
@@ -67,19 +67,27 @@ print(" | > Translations:", outputs[0])
 If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
 ```bibtex
-@misc{gumma2025inducinglongcontextabilitiesmultilingual,
-      title={Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models},
-      author={Varun Gumma and Pranjal A. Chitale and Kalika Bali},
-      year={2025},
-      eprint={2408.11382},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2408.11382},
 }
 ```
 # Note
-These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
 # Warning
 Occasionally, you may notice some variation in the output, which may not be optimal. In such cases, you can experiment with adjusting the `num_beams`, `repetition_penalty`, and `length_penalty` parameters in the `generation_config`. Based on standard testing, the example with an input size of 1457 can be run on a single A100 GPU. However, the 1B model might require more compute resources or a lower beam size for generation.

 These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
 *NOTE*:
+These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://aclanthology.org/2025.naacl-long.366/).
 Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
 ```python
 import torch
 import warnings
+from IndicTransToolkit.processor import IndicProcessor
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 warnings.filterwarnings("ignore")
 If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
 ```bibtex
+@inproceedings{gumma-etal-2025-towards,
+    title = "Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models",
+    author = "Gumma, Varun  and
+      Chitale, Pranjal A  and
+      Bali, Kalika",
+    editor = "Chiruzzo, Luis  and
+      Ritter, Alan  and
+      Wang, Lu",
+    booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
+    month = apr,
+    year = "2025",
+    address = "Albuquerque, New Mexico",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2025.naacl-long.366/",
+    pages = "7158--7170",
+    ISBN = "979-8-89176-189-6"
 }
 ```
 # Note
+These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be slightly sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
 # Warning
 Occasionally, you may notice some variation in the output, which may not be optimal. In such cases, you can experiment with adjusting the `num_beams`, `repetition_penalty`, and `length_penalty` parameters in the `generation_config`. Based on standard testing, the example with an input size of 1457 can be run on a single A100 GPU. However, the 1B model might require more compute resources or a lower beam size for generation.