Update README.md
Browse files
README.md
CHANGED
@@ -104,17 +104,29 @@ After filtering only these two types we are left with 3090 pairs, which were the
|
|
104 |
### Training Procedure
|
105 |
|
106 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
107 |
-
The standard fine-tuning training procedure was applied by creating batches from the training samples and
|
108 |
|
109 |
#### Training Hyperparameters
|
110 |
|
111 |
-
Gridspace search was applied to find the best learning rate, epoch number, weight decay and batch size. The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
1) **batch_size**: 4
|
113 |
2) **learning_rate**: 0.0002
|
114 |
3) **wight_decay**: 0.001
|
115 |
4) **epoch number**: 4
|
116 |
|
117 |
-
This gridspace search was performed 3 separate times, and it resulted in the lowest
|
118 |
|
119 |
## Evaluation
|
120 |
|
@@ -169,7 +181,7 @@ The resuls are averaged over the testing pairs.
|
|
169 |
|
170 |
#### Summary
|
171 |
|
172 |
-
The evaluation showcases that the fine-tuned model ourperforms all other models across the chosen metrics, particularly precision. This implies that the model's strength lies in being able to ensure that the corrections it makes are, in fact, valid, as opposed to the other models, all of which exhibit a recall value that's much higher than their
|
173 |
|
174 |
<!--
|
175 |
## Citation [optional]
|
|
|
104 |
### Training Procedure
|
105 |
|
106 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
107 |
+
The standard fine-tuning training procedure was applied by creating batches from the training samples and evaluating on each epoch. The model weights are optimised using cross-entropy loss.
|
108 |
|
109 |
#### Training Hyperparameters
|
110 |
|
111 |
+
Gridspace search was applied to find the best learning rate, epoch number, weight decay and batch size. The gridspace searched is as follows:
|
112 |
+
|
113 |
+
```
|
114 |
+
gridSpace = {
|
115 |
+
'batch_size': [4, 8],
|
116 |
+
'lr_rate': [0.0002, 2e-4, 2e-5],
|
117 |
+
'w_decay': [0.1, 0.01, 0.001]
|
118 |
+
}
|
119 |
+
```
|
120 |
+
|
121 |
+
Along with an epoch number from 1 to 16.
|
122 |
+
|
123 |
+
The chosen setup at the end of experimentation stage was chosen to be:
|
124 |
1) **batch_size**: 4
|
125 |
2) **learning_rate**: 0.0002
|
126 |
3) **wight_decay**: 0.001
|
127 |
4) **epoch number**: 4
|
128 |
|
129 |
+
This gridspace search was performed 3 separate times, and it resulted in the lowest average validation loss of 0.01431.
|
130 |
|
131 |
## Evaluation
|
132 |
|
|
|
181 |
|
182 |
#### Summary
|
183 |
|
184 |
+
The evaluation showcases that the fine-tuned model ourperforms all other models across the chosen metrics, particularly precision. This implies that the model's strength lies in being able to ensure that the corrections it makes are, in fact, valid, as opposed to the other models, all of which exhibit a recall value that's much higher than their respective precision.
|
185 |
|
186 |
<!--
|
187 |
## Citation [optional]
|