Model Card for Model ID

The eng-nyi-nmt model is a neural machine translation (NMT) model fine-tuned on the EnNyiCorp (under development), consisting of English and Nyishi language pairs. Nyishi, a low-resource language spoken in Arunachal Pradesh, India, faces challenges due to the scarcity of digital resources and linguistic datasets. This model aims to support the translation of Nyishi, helping preserve and promote its use in digital spaces.

To develop eng-nyi-nmt, the pre-trained model Helsinki-NLP/opus-mt-en-hi (English-to-Hindi) was leveraged as a foundation, given the structural similarities between Hindi and Nyishi in a multilingual context. Using transfer learning on this model allowed efficient adaptation of the Nyishi translation model, even with limited language data.

Model Details

Model Description

Developed by: Tungon Dugi and Nabam Kakum
Affiliation: National Institute of Technology Arunachal Pradesh, India
Email: tungondugi@gmail.com or tungon.phd24@nitap.ac.in
Model type: Translation
Language(s) (NLP): English (en) and Nyishi (nyi)
Finetuned from model: Helsinki-NLP/opus-mt-en-hi

Uses

Direct Use

This model can be used for translation and text-to-text generation between English and Nyishi.

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("repleeka/eng-nyi-nmt")
model = AutoModelForSeq2SeqLM.from_pretrained("repleeka/eng-nyi-nmt")

Training Details

Training Data

The model was trained using the EnNyiCopr dataset, which comprises aligned sentence pairs in English and Nyishi. This dataset was curated to support low-resource language machine translation, focusing on preserving and promoting Nyishi language in digital spaces.

Evaluation

The model was evaluated on translation quality using common metrics, specifically BLEU score, and runtime efficiency.

Metric	Value
BLEU Score	0.1468
Evaluation Runtime	1237.5341 seconds

The BLEU score indicates a foundational level of translation quality for English-to-Nyishi, given the limited data resources. Although further refinement is needed, this result shows encouraging progress toward accurate translations.

Summary

The eng-nyi-nmt model is in the early stages of development, offering initial translation capabilities between English and Nyishi. Further dataset expansion and enhanced training resources are crucial for advancing the model's performance, enabling better generalization and translation accuracy for practical applications. Continued efforts are essential for refining and optimizing the model's translation capabilities to address the needs of this extremely low-resource language.

repleeka
/

eng-nyi-nmt