metadata
language:
- as
- bn
- brx
- doi
- gom
- gu
- hi
- kn
- ks
- mai
- ml
- mr
- mni
- ne
- or
- pa
- sa
- sat
- snd
- ta
- te
- ur
language_details: >-
asm_Beng, ben_Beng, brx_Deva, doi_Deva, gom_Deva, guj_Gujr, hin_Deva,
kan_Knda, kas_Arab, mai_Deva, mal_Mlym, mar_Deva, mni_Mtei, npi_Deva,
ory_Orya, pan_Guru, san_Deva, sat_Olck, snd_Deva, tam_Taml, tel_Telu, urd_Arab
tags:
- indictrans2
- translation
- ai4bharat
- multilingual
license: mit
datasets:
- flores-200
- IN22-Gen
- IN22-Conv
metrics:
- bleu
- chrf
- chrf++
- comet
inference: false
IndicTrans2
This is the model card of IndicTrans2 Indic-Indic Distilled 320M variant adapted after stitching Indic-En Distilled 200M and En-Indic Distilled 200M variants.
Please refer to the blog for further details on model training, data and metrics.
Usage Instructions
Please refer to the github repository for a detail description on how to use HF compatible IndicTrans2 models for inference.
Note: IndicTrans2 is not compatible with AutoTokenizer, therefore we provide IndicTransTokenizer
Citation
If you consider using our work then please cite using:
@article{gala2023indictrans,
title={IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages},
author={Jay Gala and Pranjal A Chitale and A K Raghavan and Varun Gumma and Sumanth Doddapaneni and Aswanth Kumar M and Janki Atul Nawale and Anupama Sujatha and Ratish Puduppully and Vivek Raghavan and Pratyush Kumar and Mitesh M Khapra and Raj Dabre and Anoop Kunchukuttan},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=vfT4YuzAYA},
note={}
}