Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for Model ID

The eng-tagin-nmt model is a neural machine translation (NMT) model fine-tuned on the GinLish Corpus v0.1.0 (under development), which consists of English and Tagin language pairs. Tagin, an extremely low-resource language spoken in Arunachal Pradesh, India, faces challenges due to a scarcity of digital resources and linguistic datasets. The goal of this model is to provide translation support for Tagin, helping to preserve and promote its use in digital spaces.

To develop eng-tagin-nmt, the pre-trained model Helsinki-NLP/opus-mt-en-hi (English-to-Hindi) was leveraged as a foundation, given the structural similarities between Hindi and Tagin in a multilingual context. Transfer learning on this model allowed efficient adaptation of the Tagin translation model, despite limited language data.

Model Details

Model Description

  • Developed by: Tungon Dugi
  • Affiliation: National Institute of Technology Arunachal Pradesh, India
  • Email: tungondugi@gmail.com or tungon.phd24@nitap.ac.in
  • Model type: Translation
  • Language(s) (NLP): English (en) and Tagin (tag)
  • Finetuned from model: Helsinki-NLP/opus-mt-en-zh

Uses

Direct Use

This model can be used for translation and text-to-text generation.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("repleeka/eng-tagin-nmt")
model = AutoModelForSeq2SeqLM.from_pretrained("repleeka/eng-tagin-nmt")

Training Details

Training Data

GinLish Corpus v0.1.0

Evaluation

The model achieved the following metrics after 10 training epochs:

Metric Value
BLEU Score 27.9589
Evaluation Runtime 670.2117 seconds

The model’s BLEU score suggests promising results, with the low evaluation loss indicating strong translation performance on the GinLish Corpus, suitable for practical applications. This model represents a significant advancement for Tagin language resources, enabling English-to-Tagin translation in NLP applications.

Summary

The eng-tagin-nmt model is currently in its early phase of development. To enhance its performance, it requires a more substantial dataset and improved training resources. This would facilitate better generalization and accuracy in translating between English and Tagin, addressing the challenges faced by this extremely low-resource language. As the model evolves, ongoing efforts will be necessary to refine its capabilities further.

Downloads last month
478
Safetensors
Model size
77.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for repleeka/eng-tagin-nmt

Finetuned
(24)
this model
Finetunes
1 model

Space using repleeka/eng-tagin-nmt 1