thebogko commited on
Commit
cb20471
1 Parent(s): 8f1d478

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -4
README.md CHANGED
@@ -104,17 +104,29 @@ After filtering only these two types we are left with 3090 pairs, which were the
104
  ### Training Procedure
105
 
106
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
107
- The standard fine-tuning training procedure was applied by creating batches from the training samples and eavluating on each epoch. The model weights are optimised using cross-entropy loss.
108
 
109
  #### Training Hyperparameters
110
 
111
- Gridspace search was applied to find the best learning rate, epoch number, weight decay and batch size. The chosen setup at the end of expimentation stage was chosen to be:
 
 
 
 
 
 
 
 
 
 
 
 
112
  1) **batch_size**: 4
113
  2) **learning_rate**: 0.0002
114
  3) **wight_decay**: 0.001
115
  4) **epoch number**: 4
116
 
117
- This gridspace search was performed 3 separate times, and it resulted in the lowest avearge validation loss of 0.01431.
118
 
119
  ## Evaluation
120
 
@@ -169,7 +181,7 @@ The resuls are averaged over the testing pairs.
169
 
170
  #### Summary
171
 
172
- The evaluation showcases that the fine-tuned model ourperforms all other models across the chosen metrics, particularly precision. This implies that the model's strength lies in being able to ensure that the corrections it makes are, in fact, valid, as opposed to the other models, all of which exhibit a recall value that's much higher than their respecrive precision.
173
 
174
  <!--
175
  ## Citation [optional]
 
104
  ### Training Procedure
105
 
106
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
107
+ The standard fine-tuning training procedure was applied by creating batches from the training samples and evaluating on each epoch. The model weights are optimised using cross-entropy loss.
108
 
109
  #### Training Hyperparameters
110
 
111
+ Gridspace search was applied to find the best learning rate, epoch number, weight decay and batch size. The gridspace searched is as follows:
112
+
113
+ ```
114
+ gridSpace = {
115
+ 'batch_size': [4, 8],
116
+ 'lr_rate': [0.0002, 2e-4, 2e-5],
117
+ 'w_decay': [0.1, 0.01, 0.001]
118
+ }
119
+ ```
120
+
121
+ Along with an epoch number from 1 to 16.
122
+
123
+ The chosen setup at the end of experimentation stage was chosen to be:
124
  1) **batch_size**: 4
125
  2) **learning_rate**: 0.0002
126
  3) **wight_decay**: 0.001
127
  4) **epoch number**: 4
128
 
129
+ This gridspace search was performed 3 separate times, and it resulted in the lowest average validation loss of 0.01431.
130
 
131
  ## Evaluation
132
 
 
181
 
182
  #### Summary
183
 
184
+ The evaluation showcases that the fine-tuned model ourperforms all other models across the chosen metrics, particularly precision. This implies that the model's strength lies in being able to ensure that the corrections it makes are, in fact, valid, as opposed to the other models, all of which exhibit a recall value that's much higher than their respective precision.
185
 
186
  <!--
187
  ## Citation [optional]