|
# Assamese Instruction Following Model using mT5-small |
|
|
|
This project fine-tunes the mT5-small model for Assamese language instruction following tasks. The model is designed to understand questions in Assamese and generate relevant responses. |
|
|
|
## Model Description |
|
|
|
- Base Model: google/mt5-small (Multilingual T5) |
|
- Fine-tuned on: Assamese instruction-following dataset |
|
- Task: Question answering and instruction following in Assamese |
|
- Training Device: Google Colab T4 GPU |
|
|
|
## Dataset |
|
|
|
- Total Examples: 28,910 |
|
- Training Set: 23,128 examples |
|
- Validation Set: 5,782 examples |
|
- Format: Instruction-Input-Output pairs in Assamese |
|
|
|
## Training Configuration |
|
|
|
```python |
|
training_args = Seq2SeqTrainingArguments( |
|
num_train_epochs=2, |
|
per_device_train_batch_size=4, |
|
per_device_eval_batch_size=4, |
|
warmup_steps=200, |
|
weight_decay=0.01, |
|
gradient_accumulation_steps=2 |
|
) |
|
|
|
Model Capabilities |
|
The model can: |
|
|
|
Process Assamese script input |
|
Recognize different question types |
|
Maintain basic Assamese grammar |
|
Generate responses in Assamese |
|
|
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
# Load model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("your-username/mt5-assamese-instructions") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("your-username/mt5-assamese-instructions") |
|
|
|
# Example input |
|
text = "জীৱনত কেনেকৈ সফল হ'ব?" # How to succeed in life? |
|
|
|
# Generate response |
|
inputs = tokenizer(text, return_tensors="pt", padding=True) |
|
outputs = model.generate(**inputs) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
Limitations |
|
Current limitations include: |
|
|
|
Tendency for repetitive responses |
|
Limited coherence in longer answers |
|
Basic response structure |
|
Memory constraints due to T4 GPU |
|
|
|
Future Improvements |
|
Planned improvements include: |
|
|
|
Better response generation parameters |
|
Enhanced data preprocessing |
|
Structural markers in training data |
|
Optimization for longer responses |
|
Improved coherence in outputs |
|
|
|
|
|
@misc{mt5-assamese-instructions, |
|
author = {NanduvardhanReddy}, |
|
title = {mT5-small Fine-tuned for Assamese Instructions}, |
|
year = {2024}, |
|
publisher = {Hugging Face}, |
|
journal = {Hugging Face Model Hub} |
|
} |
|
|
|
Acknowledgments |
|
|
|
Google's mT5 team for the base model |
|
Hugging Face for the transformers library |
|
Google Colab for computation resources |
|
|
|
License |
|
This project is licensed under the Apache License 2.0 |
|
|