NanduVardhanreddy commited on
Commit
67a5ed3
·
verified ·
1 Parent(s): 709bd46

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Assamese Instruction Following Model using mT5-small
2
+
3
+ This project fine-tunes the mT5-small model for Assamese language instruction following tasks. The model is designed to understand questions in Assamese and generate relevant responses.
4
+
5
+ ## Model Description
6
+
7
+ - Base Model: google/mt5-small (Multilingual T5)
8
+ - Fine-tuned on: Assamese instruction-following dataset
9
+ - Task: Question answering and instruction following in Assamese
10
+ - Training Device: Google Colab T4 GPU
11
+
12
+ ## Dataset
13
+
14
+ - Total Examples: 28,910
15
+ - Training Set: 23,128 examples
16
+ - Validation Set: 5,782 examples
17
+ - Format: Instruction-Input-Output pairs in Assamese
18
+
19
+ ## Training Configuration
20
+
21
+ ```python
22
+ training_args = Seq2SeqTrainingArguments(
23
+ num_train_epochs=2,
24
+ per_device_train_batch_size=4,
25
+ per_device_eval_batch_size=4,
26
+ warmup_steps=200,
27
+ weight_decay=0.01,
28
+ gradient_accumulation_steps=2
29
+ )
30
+
31
+ Model Capabilities
32
+ The model can:
33
+
34
+ Process Assamese script input
35
+ Recognize different question types
36
+ Maintain basic Assamese grammar
37
+ Generate responses in Assamese
38
+
39
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
40
+
41
+ # Load model and tokenizer
42
+ tokenizer = AutoTokenizer.from_pretrained("your-username/mt5-assamese-instructions")
43
+ model = AutoModelForSeq2SeqLM.from_pretrained("your-username/mt5-assamese-instructions")
44
+
45
+ # Example input
46
+ text = "জীৱনত কেনেকৈ সফল হ'ব?" # How to succeed in life?
47
+
48
+ # Generate response
49
+ inputs = tokenizer(text, return_tensors="pt", padding=True)
50
+ outputs = model.generate(**inputs)
51
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
52
+
53
+ Limitations
54
+ Current limitations include:
55
+
56
+ Tendency for repetitive responses
57
+ Limited coherence in longer answers
58
+ Basic response structure
59
+ Memory constraints due to T4 GPU
60
+
61
+ Future Improvements
62
+ Planned improvements include:
63
+
64
+ Better response generation parameters
65
+ Enhanced data preprocessing
66
+ Structural markers in training data
67
+ Optimization for longer responses
68
+ Improved coherence in outputs
69
+
70
+
71
+ @misc{mt5-assamese-instructions,
72
+ author = {NanduvardhanReddy},
73
+ title = {mT5-small Fine-tuned for Assamese Instructions},
74
+ year = {2024},
75
+ publisher = {Hugging Face},
76
+ journal = {Hugging Face Model Hub}
77
+ }
78
+
79
+ Acknowledgments
80
+
81
+ Google's mT5 team for the base model
82
+ Hugging Face for the transformers library
83
+ Google Colab for computation resources
84
+
85
+ License
86
+ This project is licensed under the Apache License 2.0