IndicTrans2 HF Compatible Models
In this section, we provide details on how to use our IndicTrans2 models which were originally trained with the fairseq to HuggingFace transformers for inference purpose. Our scripts for HuggingFace compatible models are adapted from M2M100 repository.
Setup
To get started, follow these steps to set up the environment:
# Clone the github repository and navigate to the project directory.
git clone https://github.com/AI4Bharat/IndicTrans2
cd IndicTrans2
# Install all the dependencies and requirements associated with the project for running HF compatible models.
source install.sh
Note: The
install.sh
script in this directory is specifically for running HF compatible models for inference.
Models
Model | 🤗 HuggingFace Checkpoints |
---|---|
Preprint En-Indic | ai4bharat/indictrans2-en-indic-1B |
Preprint Indic-En | ai4bharat/indictrans2-indic-en-1B |
Inference
With the conversion complete, you can now perform inference using the HuggingFace Transformers.
You can start with the provided example.py
script and customize it for your specific translation use case:
python3 example.py
Feel free to modify the example.py
script to suit your translation needs.
Citation
@article{ai4bharat2023indictrans2,
title = {IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages},
author = {AI4Bharat and Jay Gala and Pranjal A. Chitale and Raghavan AK and Sumanth Doddapaneni and Varun Gumma and Aswanth Kumar and Janki Nawale and Anupama Sujatha and Ratish Puduppully and Vivek Raghavan and Pratyush Kumar and Mitesh M. Khapra and Raj Dabre and Anoop Kunchukuttan},
year = {2023},
journal = {arXiv preprint arXiv: 2305.16307}
}