Assamese Instruction Following Model using mT5-small
This project fine-tunes the mT5-small model for Assamese language instruction following tasks. The model is designed to understand questions in Assamese and generate relevant responses.
Model Description
- Base Model: google/mt5-small (Multilingual T5)
- Fine-tuned on: Assamese instruction-following dataset
- Task: Question answering and instruction following in Assamese
- Training Device: Google Colab T4 GPU
Dataset
- Total Examples: 28,910
- Training Set: 23,128 examples
- Validation Set: 5,782 examples
- Format: Instruction-Input-Output pairs in Assamese
Training Configuration
training_args = Seq2SeqTrainingArguments(
num_train_epochs=2,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
warmup_steps=200,
weight_decay=0.01,
gradient_accumulation_steps=2
)
Model Capabilities
The model can:
Process Assamese script input
Recognize different question types
Maintain basic Assamese grammar
Generate responses in Assamese
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/mt5-assamese-instructions")
model = AutoModelForSeq2SeqLM.from_pretrained("your-username/mt5-assamese-instructions")
# Example input
text = "জীৱনত কেনেকৈ সফল হ'ব?" # How to succeed in life?
# Generate response
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Limitations
Current limitations include:
Tendency for repetitive responses
Limited coherence in longer answers
Basic response structure
Memory constraints due to T4 GPU
Future Improvements
Planned improvements include:
Better response generation parameters
Enhanced data preprocessing
Structural markers in training data
Optimization for longer responses
Improved coherence in outputs
@misc{mt5-assamese-instructions,
author = {NanduvardhanReddy},
title = {mT5-small Fine-tuned for Assamese Instructions},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub}
}
Acknowledgments
Google's mT5 team for the base model
Hugging Face for the transformers library
Google Colab for computation resources
License
This project is licensed under the Apache License 2.0