Update README.md
Browse files
README.md
CHANGED
@@ -129,11 +129,60 @@ The following hyperparameters were used during training:
|
|
129 |
| 0.8238 | 11.88 | 1900 | 0.6735 | 0.3297 |
|
130 |
| 0.7618 | 12.5 | 2000 | 0.6728 | 0.3286 |
|
131 |
|
132 |
-
####
|
133 |
Several models with differet hyperparameters were trained. The following figures show the training process for three of them.
|
134 |
![wer](wandb-wer.png)
|
135 |
![loss](wandb-loss.png)
|
136 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
137 |
|
138 |
#### Framework versions
|
139 |
|
|
|
129 |
| 0.8238 | 11.88 | 1900 | 0.6735 | 0.3297 |
|
130 |
| 0.7618 | 12.5 | 2000 | 0.6728 | 0.3286 |
|
131 |
|
132 |
+
#### Hyperparameter Tuning
|
133 |
Several models with differet hyperparameters were trained. The following figures show the training process for three of them.
|
134 |
![wer](wandb-wer.png)
|
135 |
![loss](wandb-loss.png)
|
136 |
+
'20_2000_1e-5_hp-mehrdad' is the current model and it's hyperparameter are:
|
137 |
+
```python
|
138 |
+
model = Wav2Vec2ForCTC.from_pretrained(
|
139 |
+
model_name_or_path if not last_checkpoint else last_checkpoint,
|
140 |
+
# hp-mehrdad: Hyperparams of 'm3hrdadfi/wav2vec2-large-xlsr-persian-v3'
|
141 |
+
attention_dropout = 0.05316,
|
142 |
+
hidden_dropout = 0.01941,
|
143 |
+
feat_proj_dropout = 0.01249,
|
144 |
+
mask_time_prob = 0.04529,
|
145 |
+
layerdrop = 0.01377,
|
146 |
+
ctc_loss_reduction = 'mean',
|
147 |
+
ctc_zero_infinity = True,
|
148 |
+
)
|
149 |
+
|
150 |
+
learning_rate = 1e-5
|
151 |
+
```
|
152 |
+
The hyperparameters of '19_2000_1e-5_hp-base' are:
|
153 |
+
```python
|
154 |
+
model = Wav2Vec2ForCTC.from_pretrained(
|
155 |
+
model_name_or_path if not last_checkpoint else last_checkpoint,
|
156 |
+
# hp-base: Hyperparams simmilar to ('facebook/wav2vec2-large-xlsr-53' or 'facebook/wav2vec2-xls-r-300m')
|
157 |
+
attention_dropout = 0.1,
|
158 |
+
hidden_dropout = 0.1,
|
159 |
+
feat_proj_dropout = 0.1,
|
160 |
+
mask_time_prob = 0.075,
|
161 |
+
layerdrop = 0.1,
|
162 |
+
ctc_loss_reduction = 'mean',
|
163 |
+
ctc_zero_infinity = True,
|
164 |
+
)
|
165 |
+
|
166 |
+
learing_rate = 1e-5
|
167 |
+
```
|
168 |
+
|
169 |
+
And the hyperparameters of '22_2000_1e-5_hp-masoud' are:
|
170 |
+
```python
|
171 |
+
model = Wav2Vec2ForCTC.from_pretrained(
|
172 |
+
model_name_or_path if not last_checkpoint else last_checkpoint,
|
173 |
+
# hp-masoud: Hyperparams of 'masoudmzb/wav2vec2-xlsr-multilingual-53-fa'
|
174 |
+
attention_dropout = 0.2,
|
175 |
+
hidden_dropout = 0.2,
|
176 |
+
feat_proj_dropout = 0.1,
|
177 |
+
mask_time_prob = 0.2,
|
178 |
+
layerdrop = 0.2,
|
179 |
+
ctc_loss_reduction = 'mean',
|
180 |
+
ctc_zero_infinity = True,
|
181 |
+
)
|
182 |
+
|
183 |
+
learning_rate = 1e-5
|
184 |
+
```
|
185 |
+
As you can see this model performs better with WER metric on validation(evaluation) set.
|
186 |
|
187 |
#### Framework versions
|
188 |
|