YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Assamese Instruction Following Model using mT5-small

This project fine-tunes the mT5-small model for Assamese language instruction following tasks. The model is designed to understand questions in Assamese and generate relevant responses.

Model Description

  • Base Model: google/mt5-small (Multilingual T5)
  • Fine-tuned on: Assamese instruction-following dataset
  • Task: Question answering and instruction following in Assamese
  • Training Device: Google Colab T4 GPU

Dataset

  • Total Examples: 28,910
  • Training Set: 23,128 examples
  • Validation Set: 5,782 examples
  • Format: Instruction-Input-Output pairs in Assamese

Training Configuration

training_args = Seq2SeqTrainingArguments(
   num_train_epochs=2,
   per_device_train_batch_size=4,
   per_device_eval_batch_size=4,
   warmup_steps=200,
   weight_decay=0.01,
   gradient_accumulation_steps=2
)

Model Capabilities
The model can:

Process Assamese script input
Recognize different question types
Maintain basic Assamese grammar
Generate responses in Assamese

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/mt5-assamese-instructions")
model = AutoModelForSeq2SeqLM.from_pretrained("your-username/mt5-assamese-instructions")

# Example input
text = "জীৱনত কেনেকৈ সফল হ'ব?"  # How to succeed in life?

# Generate response
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Limitations
Current limitations include:

Tendency for repetitive responses
Limited coherence in longer answers
Basic response structure
Memory constraints due to T4 GPU

Future Improvements
Planned improvements include:

Better response generation parameters
Enhanced data preprocessing
Structural markers in training data
Optimization for longer responses
Improved coherence in outputs


@misc{mt5-assamese-instructions,
  author = {NanduvardhanReddy},
  title = {mT5-small Fine-tuned for Assamese Instructions},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub}
}

Acknowledgments

Google's mT5 team for the base model
Hugging Face for the transformers library
Google Colab for computation resources

License
This project is licensed under the Apache License 2.0
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .