thebogko
/

mt5-finetuned-bulgarian-grammar-mistakes

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

thebogko commited on Mar 14

Commit

cb20471

•

1 Parent(s): 8f1d478

Update README.md

Files changed (1) hide show

README.md +16 -4

README.md CHANGED Viewed

@@ -104,17 +104,29 @@ After filtering only these two types we are left with 3090 pairs, which were the
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-The standard fine-tuning training procedure was applied by creating batches from the training samples and eavluating on each epoch. The model weights are optimised using cross-entropy loss.
 #### Training Hyperparameters
-Gridspace search was applied to find the best learning rate, epoch number, weight decay and batch size. The chosen setup at the end of expimentation stage was chosen to be:
   1) **batch_size**: 4
   2) **learning_rate**: 0.0002
   3) **wight_decay**: 0.001
   4) **epoch number**: 4
-This gridspace search was performed 3 separate times, and it resulted in the lowest avearge validation loss of 0.01431.
 ## Evaluation
@@ -169,7 +181,7 @@ The resuls are averaged over the testing pairs.
 #### Summary
-The evaluation showcases that the fine-tuned model ourperforms all other models across the chosen metrics, particularly precision. This implies that the model's strength lies in being able to ensure that the corrections it makes are, in fact, valid, as opposed to the other models, all of which exhibit a recall value that's much higher than their respecrive precision.
 <!--
 ## Citation [optional]

 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+The standard fine-tuning training procedure was applied by creating batches from the training samples and evaluating on each epoch. The model weights are optimised using cross-entropy loss.
 #### Training Hyperparameters
+Gridspace search was applied to find the best learning rate, epoch number, weight decay and batch size. The gridspace searched is as follows:
+```
+gridSpace = {
+    'batch_size': [4, 8],
+    'lr_rate': [0.0002, 2e-4, 2e-5],
+    'w_decay': [0.1, 0.01, 0.001]
+}
+```
+Along with an epoch number from 1 to 16.
+The chosen setup at the end of experimentation stage was chosen to be:
   1) **batch_size**: 4
   2) **learning_rate**: 0.0002
   3) **wight_decay**: 0.001
   4) **epoch number**: 4
+This gridspace search was performed 3 separate times, and it resulted in the lowest average validation loss of 0.01431.
 ## Evaluation
 #### Summary
+The evaluation showcases that the fine-tuned model ourperforms all other models across the chosen metrics, particularly precision. This implies that the model's strength lies in being able to ensure that the corrections it makes are, in fact, valid, as opposed to the other models, all of which exhibit a recall value that's much higher than their respective precision.
 <!--
 ## Citation [optional]