license: cc-by-nd-4.0
language:
- en
- nyi
metrics:
- bleu
base_model:
- Helsinki-NLP/opus-mt-en-hi
pipeline_tag: translation
library_name: transformers
tags:
- english
- nyishi
- nmt
- translation
- nlp
Model Card for Model ID
The eng-nyi-nmt model is a neural machine translation (NMT) model fine-tuned on the EnNyiCorp (under development), consisting of English and Nyishi language pairs. Nyishi, a low-resource language spoken in Arunachal Pradesh, India, faces challenges due to the scarcity of digital resources and linguistic datasets. This model aims to support the translation of Nyishi, helping preserve and promote its use in digital spaces.
To develop eng-nyi-nmt, the pre-trained model Helsinki-NLP/opus-mt-en-hi (English-to-Hindi) was leveraged as a foundation, given the structural similarities between Hindi and Nyishi in a multilingual context. Using transfer learning on this model allowed efficient adaptation of the Nyishi translation model, even with limited language data.
Model Details
Model Description
- Developed by: Tungon Dugi and Nabam Kakum
- Affiliation: National Institute of Technology Arunachal Pradesh, India
- Email: tungondugi@gmail.com or tungon.phd24@nitap.ac.in
- Model type: Translation
- Language(s) (NLP): English (en) and Nyishi (nyi)
- Finetuned from model: Helsinki-NLP/opus-mt-en-hi
Uses
Direct Use
This model can be used for translation and text-to-text generation between English and Nyishi.
How to Get Started with the Model
Use the code below to get started with the model:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("repleeka/eng-nyi-nmt")
model = AutoModelForSeq2SeqLM.from_pretrained("repleeka/eng-nyi-nmt")
Training Details
Training Data
The model was trained using the EnNyiCopr dataset, which comprises aligned sentence pairs in English and Nyishi. This dataset was curated to support low-resource language machine translation, focusing on preserving and promoting Nyishi language in digital spaces.
Evaluation
The model was evaluated on translation quality using common metrics, specifically BLEU score, and runtime efficiency.
Metric | Value |
---|---|
BLEU Score | 0.1468 |
Evaluation Runtime | 1237.5341 seconds |
The BLEU score indicates a foundational level of translation quality for English-to-Nyishi, given the limited data resources. Although further refinement is needed, this result shows encouraging progress toward accurate translations.
Summary
The eng-nyi-nmt model is in the early stages of development, offering initial translation capabilities between English and Nyishi. Further dataset expansion and enhanced training resources are crucial for advancing the model's performance, enabling better generalization and translation accuracy for practical applications. Continued efforts are essential for refining and optimizing the model's translation capabilities to address the needs of this extremely low-resource language.