Assamese Instruction Following Model using mT5-small

This project fine-tunes the mT5-small model for Assamese language instruction following tasks. The model is designed to understand questions in Assamese and generate relevant responses.

Model Description

Base Model: google/mt5-small (Multilingual T5)
Fine-tuned on: Assamese instruction-following dataset
Task: Question answering and instruction following in Assamese
Training Device: Google Colab T4 GPU

Dataset

Total Examples: 28,910
Training Set: 23,128 examples
Validation Set: 5,782 examples
Format: Instruction-Input-Output pairs in Assamese

Training Configuration

training_args = Seq2SeqTrainingArguments(
   num_train_epochs=2,
   per_device_train_batch_size=4,
   per_device_eval_batch_size=4,
   warmup_steps=200,
   weight_decay=0.01,
   gradient_accumulation_steps=2
)

Model Capabilities
The model can:

Process Assamese script input
Recognize different question types
Maintain basic Assamese grammar
Generate responses in Assamese

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/mt5-assamese-instructions")
model = AutoModelForSeq2SeqLM.from_pretrained("your-username/mt5-assamese-instructions")

# Example input
text = "জীৱনত কেনেকৈ সফল হ'ব?"  # How to succeed in life?

# Generate response
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Limitations
Current limitations include:

Tendency for repetitive responses
Limited coherence in longer answers
Basic response structure
Memory constraints due to T4 GPU

Future Improvements
Planned improvements include:

Better response generation parameters
Enhanced data preprocessing
Structural markers in training data
Optimization for longer responses
Improved coherence in outputs


@misc{mt5-assamese-instructions,
  author = {NanduvardhanReddy},
  title = {mT5-small Fine-tuned for Assamese Instructions},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub}
}

Acknowledgments

Google's mT5 team for the base model
Hugging Face for the transformers library
Google Colab for computation resources

License
This project is licensed under the Apache License 2.0