metadata

language:
  - as
  - bn
  - brx
  - doi
  - gom
  - gu
  - hi
  - kn
  - ks
  - mai
  - ml
  - mr
  - mni
  - ne
  - or
  - pa
  - sa
  - sat
  - snd
  - ta
  - te
  - ur
language_details: >-
  asm_Beng, ben_Beng, brx_Deva, doi_Deva, gom_Deva, guj_Gujr, hin_Deva,
  kan_Knda, kas_Arab, mai_Deva, mal_Mlym, mar_Deva, mni_Mtei, npi_Deva,
  ory_Orya, pan_Guru, san_Deva, sat_Olck, snd_Deva, tam_Taml, tel_Telu, urd_Arab
tags:
  - indictrans2
  - translation
  - ai4bharat
  - multilingual
license: mit
datasets:
  - flores-200
  - IN22-Gen
  - IN22-Conv
metrics:
  - bleu
  - chrf
  - chrf++
  - comet
inference: false

IndicTrans2

This is the model card of IndicTrans2 Indic-Indic Distilled 320M variant adapted after stitching Indic-En Distilled 200M and En-Indic Distilled 200M variants.

Please refer to the blog for further details on model training, data and metrics.

Usage Instructions

Please refer to the github repository for a detail description on how to use HF compatible IndicTrans2 models for inference.

Note: IndicTrans2 is not compatible with AutoTokenizer, therefore we provide IndicTransTokenizer

Citation

If you consider using our work then please cite using:

@article{gala2023indictrans,
title={IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages},
author={Jay Gala and Pranjal A Chitale and A K Raghavan and Varun Gumma and Sumanth Doddapaneni and Aswanth Kumar M and Janki Atul Nawale and Anupama Sujatha and Ratish Puduppully and Vivek Raghavan and Pratyush Kumar and Mitesh M Khapra and Raj Dabre and Anoop Kunchukuttan},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=vfT4YuzAYA},
note={}
}