# Assamese Instruction Following Model using mT5-small This project fine-tunes the mT5-small model for Assamese language instruction following tasks. The model is designed to understand questions in Assamese and generate relevant responses. ## Model Description - Base Model: google/mt5-small (Multilingual T5) - Fine-tuned on: Assamese instruction-following dataset - Task: Question answering and instruction following in Assamese - Training Device: Google Colab T4 GPU ## Dataset - Total Examples: 28,910 - Training Set: 23,128 examples - Validation Set: 5,782 examples - Format: Instruction-Input-Output pairs in Assamese ## Training Configuration ```python training_args = Seq2SeqTrainingArguments( num_train_epochs=2, per_device_train_batch_size=4, per_device_eval_batch_size=4, warmup_steps=200, weight_decay=0.01, gradient_accumulation_steps=2 ) Model Capabilities The model can: Process Assamese script input Recognize different question types Maintain basic Assamese grammar Generate responses in Assamese from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("your-username/mt5-assamese-instructions") model = AutoModelForSeq2SeqLM.from_pretrained("your-username/mt5-assamese-instructions") # Example input text = "জীৱনত কেনেকৈ সফল হ'ব?" # How to succeed in life? # Generate response inputs = tokenizer(text, return_tensors="pt", padding=True) outputs = model.generate(**inputs) response = tokenizer.decode(outputs[0], skip_special_tokens=True) Limitations Current limitations include: Tendency for repetitive responses Limited coherence in longer answers Basic response structure Memory constraints due to T4 GPU Future Improvements Planned improvements include: Better response generation parameters Enhanced data preprocessing Structural markers in training data Optimization for longer responses Improved coherence in outputs @misc{mt5-assamese-instructions, author = {NanduvardhanReddy}, title = {mT5-small Fine-tuned for Assamese Instructions}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub} } Acknowledgments Google's mT5 team for the base model Hugging Face for the transformers library Google Colab for computation resources License This project is licensed under the Apache License 2.0