lucio commited on
Commit
63d0ec4
1 Parent(s): 4d253b0

final model and eval

Browse files
README.md CHANGED
@@ -1,39 +1,66 @@
1
  ---
 
 
2
  license: apache-2.0
3
  tags:
 
 
4
  - generated_from_trainer
 
5
  datasets:
6
- - common_voice
7
  model-index:
8
- - name: xls-r-kyrgiz-cv8
9
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
- # xls-r-kyrgiz-cv8
16
 
17
- This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.5495
20
- - Wer: 0.2951
21
- - Cer: 0.0789
22
 
23
  ## Model description
24
 
25
- More information needed
 
 
26
 
27
  ## Intended uses & limitations
28
 
29
- More information needed
 
 
 
 
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
34
 
35
  ## Training procedure
36
 
 
 
37
  ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
 
1
  ---
2
+ language:
3
+ - ky
4
  license: apache-2.0
5
  tags:
6
+ - automatic-speech-recognition
7
+ - mozilla-foundation/common_voice_8_0
8
  - generated_from_trainer
9
+ - robust-speech-event
10
  datasets:
11
+ - mozilla-foundation/common_voice_8_0
12
  model-index:
13
+ - name: XLS-R-300M Kyrgiz CV8
14
+ results:
15
+ - task:
16
+ name: Automatic Speech Recognition
17
+ type: automatic-speech-recognition
18
+ dataset:
19
+ name: Common Voice 8
20
+ type: mozilla-foundation/common_voice_8_0
21
+ args: ky
22
+ metrics:
23
+ - name: Test WER
24
+ type: wer
25
+ value: 31.28
26
+ - name: Test CER
27
+ type: cer
28
+ value: 7.66
29
  ---
30
 
31
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
32
  should probably proofread and complete it, then remove this comment. -->
33
 
34
+ # XLS-R-300M Kyrgiz CV8
35
 
36
+ This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - KY dataset.
37
  It achieves the following results on the evaluation set:
38
+ - Loss: 0.5497
39
+ - Wer: 0.2945
40
+ - Cer: 0.0791
41
 
42
  ## Model description
43
 
44
+ For a description of the model architecture, see [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m)
45
+
46
+ The model vocabulary consists of the cyrillic alphabet with punctuation removed.
47
 
48
  ## Intended uses & limitations
49
 
50
+ This model is expected to be of some utility for low-fidelity use cases such as:
51
+ - Draft video captions
52
+ - Indexing of recorded broadcasts
53
+
54
+ The model is not reliable enough to use as a substitute for live captions for accessibility purposes, and it should not be used in a manner that would infringe the privacy of any of the contributors to the Common Voice dataset nor any other speakers.
55
 
56
  ## Training and evaluation data
57
 
58
+ The combination of `train` and `dev` of common voice official splits were used as training data. The half of the official `test` split was used as validation data as well as for final evaluation.
59
 
60
  ## Training procedure
61
 
62
+ The featurization layers of the XLS-R model are frozen while tuning a final CTC/LM layer on the Uyghur CV8 example sentences. A ramped learning rate is used with an initial warmup phase of 500 steps, a max of 0.0001, and cooling back towards 0 for the remainder of the 8100 steps (300 epochs).
63
+
64
  ### Training hyperparameters
65
 
66
  The following hyperparameters were used during training:
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 299.98,
3
+ "eval_cer": 0.07907625868969875,
4
+ "eval_loss": 0.5497409105300903,
5
+ "eval_runtime": 34.2437,
6
+ "eval_samples": 807,
7
+ "eval_samples_per_second": 23.566,
8
+ "eval_steps_per_second": 2.949,
9
+ "eval_wer": 0.29454945861625986,
10
+ "train_loss": 0.7936776961809323,
11
+ "train_runtime": 54993.0112,
12
+ "train_samples": 3509,
13
+ "train_samples_per_second": 19.142,
14
+ "train_steps_per_second": 0.147
15
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 299.98,
3
+ "eval_cer": 0.07907625868969875,
4
+ "eval_loss": 0.5497409105300903,
5
+ "eval_runtime": 34.2437,
6
+ "eval_samples": 807,
7
+ "eval_samples_per_second": 23.566,
8
+ "eval_steps_per_second": 2.949,
9
+ "eval_wer": 0.29454945861625986
10
+ }
mozilla-foundation_common_voice_8_0_ky_test_eval_results.txt CHANGED
@@ -1,2 +1,2 @@
1
- WER: 0.28576669112252384
2
- CER: 0.07143709162237744
 
1
+ WER: 0.27494497432134996
2
+ CER: 0.0674079866403361
runs/Feb05_21-45-17_job-699ba53c-fea9-4eb2-81af-a97f440eaa45/events.out.tfevents.1644152736.job-699ba53c-fea9-4eb2-81af-a97f440eaa45.2077552.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5ddbb74a4925c37a487023c1ff094be9e1f186e8f00d07432341684a6d3b633
3
+ size 405
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 299.98,
3
+ "train_loss": 0.7936776961809323,
4
+ "train_runtime": 54993.0112,
5
+ "train_samples": 3509,
6
+ "train_samples_per_second": 19.142,
7
+ "train_steps_per_second": 0.147
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,671 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 299.9818181818182,
5
+ "global_step": 8100,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 3.69,
12
+ "learning_rate": 1.9600000000000002e-05,
13
+ "loss": 12.2644,
14
+ "step": 100
15
+ },
16
+ {
17
+ "epoch": 7.4,
18
+ "learning_rate": 3.960000000000001e-05,
19
+ "loss": 4.6184,
20
+ "step": 200
21
+ },
22
+ {
23
+ "epoch": 11.11,
24
+ "learning_rate": 5.96e-05,
25
+ "loss": 3.5365,
26
+ "step": 300
27
+ },
28
+ {
29
+ "epoch": 14.8,
30
+ "learning_rate": 7.960000000000001e-05,
31
+ "loss": 3.2788,
32
+ "step": 400
33
+ },
34
+ {
35
+ "epoch": 18.51,
36
+ "learning_rate": 9.960000000000001e-05,
37
+ "loss": 3.1079,
38
+ "step": 500
39
+ },
40
+ {
41
+ "epoch": 18.51,
42
+ "eval_cer": 0.9825416052243522,
43
+ "eval_loss": 2.679511547088623,
44
+ "eval_runtime": 34.6418,
45
+ "eval_samples_per_second": 23.296,
46
+ "eval_steps_per_second": 2.916,
47
+ "eval_wer": 0.9996329601761791,
48
+ "step": 500
49
+ },
50
+ {
51
+ "epoch": 22.22,
52
+ "learning_rate": 9.871052631578948e-05,
53
+ "loss": 2.193,
54
+ "step": 600
55
+ },
56
+ {
57
+ "epoch": 25.91,
58
+ "learning_rate": 9.739473684210527e-05,
59
+ "loss": 1.3185,
60
+ "step": 700
61
+ },
62
+ {
63
+ "epoch": 29.62,
64
+ "learning_rate": 9.607894736842105e-05,
65
+ "loss": 1.0508,
66
+ "step": 800
67
+ },
68
+ {
69
+ "epoch": 33.33,
70
+ "learning_rate": 9.476315789473684e-05,
71
+ "loss": 0.9319,
72
+ "step": 900
73
+ },
74
+ {
75
+ "epoch": 37.04,
76
+ "learning_rate": 9.344736842105263e-05,
77
+ "loss": 0.8506,
78
+ "step": 1000
79
+ },
80
+ {
81
+ "epoch": 37.04,
82
+ "eval_cer": 0.096087002317253,
83
+ "eval_loss": 0.43226873874664307,
84
+ "eval_runtime": 34.4996,
85
+ "eval_samples_per_second": 23.392,
86
+ "eval_steps_per_second": 2.928,
87
+ "eval_wer": 0.37181134153055606,
88
+ "step": 1000
89
+ },
90
+ {
91
+ "epoch": 40.73,
92
+ "learning_rate": 9.213157894736843e-05,
93
+ "loss": 0.8002,
94
+ "step": 1100
95
+ },
96
+ {
97
+ "epoch": 44.44,
98
+ "learning_rate": 9.081578947368421e-05,
99
+ "loss": 0.765,
100
+ "step": 1200
101
+ },
102
+ {
103
+ "epoch": 48.15,
104
+ "learning_rate": 8.950000000000001e-05,
105
+ "loss": 0.7327,
106
+ "step": 1300
107
+ },
108
+ {
109
+ "epoch": 51.84,
110
+ "learning_rate": 8.818421052631579e-05,
111
+ "loss": 0.7094,
112
+ "step": 1400
113
+ },
114
+ {
115
+ "epoch": 55.55,
116
+ "learning_rate": 8.686842105263159e-05,
117
+ "loss": 0.6821,
118
+ "step": 1500
119
+ },
120
+ {
121
+ "epoch": 55.55,
122
+ "eval_cer": 0.0877922898672846,
123
+ "eval_loss": 0.41051027178764343,
124
+ "eval_runtime": 34.0911,
125
+ "eval_samples_per_second": 23.672,
126
+ "eval_steps_per_second": 2.963,
127
+ "eval_wer": 0.3310699210864379,
128
+ "step": 1500
129
+ },
130
+ {
131
+ "epoch": 59.25,
132
+ "learning_rate": 8.555263157894737e-05,
133
+ "loss": 0.6763,
134
+ "step": 1600
135
+ },
136
+ {
137
+ "epoch": 62.95,
138
+ "learning_rate": 8.423684210526316e-05,
139
+ "loss": 0.6484,
140
+ "step": 1700
141
+ },
142
+ {
143
+ "epoch": 66.65,
144
+ "learning_rate": 8.292105263157896e-05,
145
+ "loss": 0.6371,
146
+ "step": 1800
147
+ },
148
+ {
149
+ "epoch": 70.36,
150
+ "learning_rate": 8.160526315789474e-05,
151
+ "loss": 0.6149,
152
+ "step": 1900
153
+ },
154
+ {
155
+ "epoch": 74.07,
156
+ "learning_rate": 8.028947368421052e-05,
157
+ "loss": 0.6091,
158
+ "step": 2000
159
+ },
160
+ {
161
+ "epoch": 74.07,
162
+ "eval_cer": 0.08513271539919949,
163
+ "eval_loss": 0.4281017482280731,
164
+ "eval_runtime": 36.3671,
165
+ "eval_samples_per_second": 22.19,
166
+ "eval_steps_per_second": 2.777,
167
+ "eval_wer": 0.3167553679574234,
168
+ "step": 2000
169
+ },
170
+ {
171
+ "epoch": 77.76,
172
+ "learning_rate": 7.897368421052632e-05,
173
+ "loss": 0.5908,
174
+ "step": 2100
175
+ },
176
+ {
177
+ "epoch": 81.47,
178
+ "learning_rate": 7.76578947368421e-05,
179
+ "loss": 0.5809,
180
+ "step": 2200
181
+ },
182
+ {
183
+ "epoch": 85.18,
184
+ "learning_rate": 7.63421052631579e-05,
185
+ "loss": 0.568,
186
+ "step": 2300
187
+ },
188
+ {
189
+ "epoch": 88.87,
190
+ "learning_rate": 7.50263157894737e-05,
191
+ "loss": 0.5531,
192
+ "step": 2400
193
+ },
194
+ {
195
+ "epoch": 92.58,
196
+ "learning_rate": 7.371052631578948e-05,
197
+ "loss": 0.5429,
198
+ "step": 2500
199
+ },
200
+ {
201
+ "epoch": 92.58,
202
+ "eval_cer": 0.08421108068253634,
203
+ "eval_loss": 0.4524887800216675,
204
+ "eval_runtime": 33.8823,
205
+ "eval_samples_per_second": 23.818,
206
+ "eval_steps_per_second": 2.981,
207
+ "eval_wer": 0.3147366489264085,
208
+ "step": 2500
209
+ },
210
+ {
211
+ "epoch": 96.29,
212
+ "learning_rate": 7.239473684210527e-05,
213
+ "loss": 0.5476,
214
+ "step": 2600
215
+ },
216
+ {
217
+ "epoch": 99.98,
218
+ "learning_rate": 7.107894736842106e-05,
219
+ "loss": 0.5312,
220
+ "step": 2700
221
+ },
222
+ {
223
+ "epoch": 103.69,
224
+ "learning_rate": 6.976315789473684e-05,
225
+ "loss": 0.5228,
226
+ "step": 2800
227
+ },
228
+ {
229
+ "epoch": 107.4,
230
+ "learning_rate": 6.846052631578947e-05,
231
+ "loss": 0.5085,
232
+ "step": 2900
233
+ },
234
+ {
235
+ "epoch": 111.11,
236
+ "learning_rate": 6.714473684210527e-05,
237
+ "loss": 0.5063,
238
+ "step": 3000
239
+ },
240
+ {
241
+ "epoch": 111.11,
242
+ "eval_cer": 0.08392142405729934,
243
+ "eval_loss": 0.46191853284835815,
244
+ "eval_runtime": 34.7379,
245
+ "eval_samples_per_second": 23.231,
246
+ "eval_steps_per_second": 2.907,
247
+ "eval_wer": 0.31436960910258765,
248
+ "step": 3000
249
+ },
250
+ {
251
+ "epoch": 114.8,
252
+ "learning_rate": 6.582894736842105e-05,
253
+ "loss": 0.4929,
254
+ "step": 3100
255
+ },
256
+ {
257
+ "epoch": 118.51,
258
+ "learning_rate": 6.451315789473685e-05,
259
+ "loss": 0.4921,
260
+ "step": 3200
261
+ },
262
+ {
263
+ "epoch": 122.22,
264
+ "learning_rate": 6.319736842105264e-05,
265
+ "loss": 0.486,
266
+ "step": 3300
267
+ },
268
+ {
269
+ "epoch": 125.91,
270
+ "learning_rate": 6.188157894736843e-05,
271
+ "loss": 0.4775,
272
+ "step": 3400
273
+ },
274
+ {
275
+ "epoch": 129.62,
276
+ "learning_rate": 6.056578947368421e-05,
277
+ "loss": 0.4661,
278
+ "step": 3500
279
+ },
280
+ {
281
+ "epoch": 129.62,
282
+ "eval_cer": 0.08178849799873604,
283
+ "eval_loss": 0.4659934639930725,
284
+ "eval_runtime": 33.3081,
285
+ "eval_samples_per_second": 24.228,
286
+ "eval_steps_per_second": 3.032,
287
+ "eval_wer": 0.30390897412369244,
288
+ "step": 3500
289
+ },
290
+ {
291
+ "epoch": 133.33,
292
+ "learning_rate": 5.9250000000000004e-05,
293
+ "loss": 0.4571,
294
+ "step": 3600
295
+ },
296
+ {
297
+ "epoch": 137.04,
298
+ "learning_rate": 5.793421052631579e-05,
299
+ "loss": 0.4525,
300
+ "step": 3700
301
+ },
302
+ {
303
+ "epoch": 140.73,
304
+ "learning_rate": 5.6618421052631575e-05,
305
+ "loss": 0.4502,
306
+ "step": 3800
307
+ },
308
+ {
309
+ "epoch": 144.44,
310
+ "learning_rate": 5.530263157894737e-05,
311
+ "loss": 0.4458,
312
+ "step": 3900
313
+ },
314
+ {
315
+ "epoch": 148.15,
316
+ "learning_rate": 5.398684210526316e-05,
317
+ "loss": 0.4353,
318
+ "step": 4000
319
+ },
320
+ {
321
+ "epoch": 148.15,
322
+ "eval_cer": 0.08202548978302085,
323
+ "eval_loss": 0.46947482228279114,
324
+ "eval_runtime": 33.4569,
325
+ "eval_samples_per_second": 24.121,
326
+ "eval_steps_per_second": 3.019,
327
+ "eval_wer": 0.308313452009543,
328
+ "step": 4000
329
+ },
330
+ {
331
+ "epoch": 151.84,
332
+ "learning_rate": 5.2671052631578957e-05,
333
+ "loss": 0.4299,
334
+ "step": 4100
335
+ },
336
+ {
337
+ "epoch": 155.55,
338
+ "learning_rate": 5.135526315789474e-05,
339
+ "loss": 0.4253,
340
+ "step": 4200
341
+ },
342
+ {
343
+ "epoch": 159.25,
344
+ "learning_rate": 5.003947368421053e-05,
345
+ "loss": 0.4202,
346
+ "step": 4300
347
+ },
348
+ {
349
+ "epoch": 162.95,
350
+ "learning_rate": 4.872368421052632e-05,
351
+ "loss": 0.4133,
352
+ "step": 4400
353
+ },
354
+ {
355
+ "epoch": 166.65,
356
+ "learning_rate": 4.740789473684211e-05,
357
+ "loss": 0.4048,
358
+ "step": 4500
359
+ },
360
+ {
361
+ "epoch": 166.65,
362
+ "eval_cer": 0.08244680851063829,
363
+ "eval_loss": 0.49092647433280945,
364
+ "eval_runtime": 33.6459,
365
+ "eval_samples_per_second": 23.985,
366
+ "eval_steps_per_second": 3.002,
367
+ "eval_wer": 0.3084969719214535,
368
+ "step": 4500
369
+ },
370
+ {
371
+ "epoch": 170.36,
372
+ "learning_rate": 4.6092105263157896e-05,
373
+ "loss": 0.3994,
374
+ "step": 4600
375
+ },
376
+ {
377
+ "epoch": 174.07,
378
+ "learning_rate": 4.4776315789473685e-05,
379
+ "loss": 0.397,
380
+ "step": 4700
381
+ },
382
+ {
383
+ "epoch": 177.76,
384
+ "learning_rate": 4.3460526315789474e-05,
385
+ "loss": 0.3836,
386
+ "step": 4800
387
+ },
388
+ {
389
+ "epoch": 181.47,
390
+ "learning_rate": 4.2144736842105264e-05,
391
+ "loss": 0.3922,
392
+ "step": 4900
393
+ },
394
+ {
395
+ "epoch": 185.18,
396
+ "learning_rate": 4.082894736842105e-05,
397
+ "loss": 0.3852,
398
+ "step": 5000
399
+ },
400
+ {
401
+ "epoch": 185.18,
402
+ "eval_cer": 0.08120918474826207,
403
+ "eval_loss": 0.5073934197425842,
404
+ "eval_runtime": 34.5846,
405
+ "eval_samples_per_second": 23.334,
406
+ "eval_steps_per_second": 2.92,
407
+ "eval_wer": 0.3048265736832446,
408
+ "step": 5000
409
+ },
410
+ {
411
+ "epoch": 188.87,
412
+ "learning_rate": 3.951315789473685e-05,
413
+ "loss": 0.3784,
414
+ "step": 5100
415
+ },
416
+ {
417
+ "epoch": 192.58,
418
+ "learning_rate": 3.819736842105263e-05,
419
+ "loss": 0.3733,
420
+ "step": 5200
421
+ },
422
+ {
423
+ "epoch": 196.29,
424
+ "learning_rate": 3.688157894736842e-05,
425
+ "loss": 0.3775,
426
+ "step": 5300
427
+ },
428
+ {
429
+ "epoch": 199.98,
430
+ "learning_rate": 3.5565789473684217e-05,
431
+ "loss": 0.3574,
432
+ "step": 5400
433
+ },
434
+ {
435
+ "epoch": 203.69,
436
+ "learning_rate": 3.4250000000000006e-05,
437
+ "loss": 0.3567,
438
+ "step": 5500
439
+ },
440
+ {
441
+ "epoch": 203.69,
442
+ "eval_cer": 0.08099852538445333,
443
+ "eval_loss": 0.5110859274864197,
444
+ "eval_runtime": 34.3591,
445
+ "eval_samples_per_second": 23.487,
446
+ "eval_steps_per_second": 2.94,
447
+ "eval_wer": 0.3011561754450358,
448
+ "step": 5500
449
+ },
450
+ {
451
+ "epoch": 207.4,
452
+ "learning_rate": 3.293421052631579e-05,
453
+ "loss": 0.3585,
454
+ "step": 5600
455
+ },
456
+ {
457
+ "epoch": 211.11,
458
+ "learning_rate": 3.161842105263158e-05,
459
+ "loss": 0.3503,
460
+ "step": 5700
461
+ },
462
+ {
463
+ "epoch": 214.8,
464
+ "learning_rate": 3.030263157894737e-05,
465
+ "loss": 0.3452,
466
+ "step": 5800
467
+ },
468
+ {
469
+ "epoch": 218.51,
470
+ "learning_rate": 2.8986842105263156e-05,
471
+ "loss": 0.3446,
472
+ "step": 5900
473
+ },
474
+ {
475
+ "epoch": 222.22,
476
+ "learning_rate": 2.768421052631579e-05,
477
+ "loss": 0.3451,
478
+ "step": 6000
479
+ },
480
+ {
481
+ "epoch": 222.22,
482
+ "eval_cer": 0.08041921213397936,
483
+ "eval_loss": 0.5224971175193787,
484
+ "eval_runtime": 33.9572,
485
+ "eval_samples_per_second": 23.765,
486
+ "eval_steps_per_second": 2.974,
487
+ "eval_wer": 0.2982198568544687,
488
+ "step": 6000
489
+ },
490
+ {
491
+ "epoch": 225.91,
492
+ "learning_rate": 2.6368421052631582e-05,
493
+ "loss": 0.3334,
494
+ "step": 6100
495
+ },
496
+ {
497
+ "epoch": 229.62,
498
+ "learning_rate": 2.505263157894737e-05,
499
+ "loss": 0.338,
500
+ "step": 6200
501
+ },
502
+ {
503
+ "epoch": 233.33,
504
+ "learning_rate": 2.373684210526316e-05,
505
+ "loss": 0.332,
506
+ "step": 6300
507
+ },
508
+ {
509
+ "epoch": 237.04,
510
+ "learning_rate": 2.242105263157895e-05,
511
+ "loss": 0.3301,
512
+ "step": 6400
513
+ },
514
+ {
515
+ "epoch": 240.73,
516
+ "learning_rate": 2.110526315789474e-05,
517
+ "loss": 0.325,
518
+ "step": 6500
519
+ },
520
+ {
521
+ "epoch": 240.73,
522
+ "eval_cer": 0.07955024225826839,
523
+ "eval_loss": 0.5269995927810669,
524
+ "eval_runtime": 33.5845,
525
+ "eval_samples_per_second": 24.029,
526
+ "eval_steps_per_second": 3.007,
527
+ "eval_wer": 0.2954670581758121,
528
+ "step": 6500
529
+ },
530
+ {
531
+ "epoch": 244.44,
532
+ "learning_rate": 1.9789473684210528e-05,
533
+ "loss": 0.327,
534
+ "step": 6600
535
+ },
536
+ {
537
+ "epoch": 248.15,
538
+ "learning_rate": 1.8473684210526317e-05,
539
+ "loss": 0.3161,
540
+ "step": 6700
541
+ },
542
+ {
543
+ "epoch": 251.84,
544
+ "learning_rate": 1.7157894736842107e-05,
545
+ "loss": 0.3193,
546
+ "step": 6800
547
+ },
548
+ {
549
+ "epoch": 255.55,
550
+ "learning_rate": 1.5842105263157896e-05,
551
+ "loss": 0.3104,
552
+ "step": 6900
553
+ },
554
+ {
555
+ "epoch": 259.25,
556
+ "learning_rate": 1.4539473684210528e-05,
557
+ "loss": 0.3089,
558
+ "step": 7000
559
+ },
560
+ {
561
+ "epoch": 259.25,
562
+ "eval_cer": 0.07933958289445966,
563
+ "eval_loss": 0.5381476283073425,
564
+ "eval_runtime": 35.2467,
565
+ "eval_samples_per_second": 22.896,
566
+ "eval_steps_per_second": 2.866,
567
+ "eval_wer": 0.29289777940906586,
568
+ "step": 7000
569
+ },
570
+ {
571
+ "epoch": 262.95,
572
+ "learning_rate": 1.3223684210526315e-05,
573
+ "loss": 0.3087,
574
+ "step": 7100
575
+ },
576
+ {
577
+ "epoch": 266.65,
578
+ "learning_rate": 1.1907894736842106e-05,
579
+ "loss": 0.305,
580
+ "step": 7200
581
+ },
582
+ {
583
+ "epoch": 270.36,
584
+ "learning_rate": 1.0592105263157895e-05,
585
+ "loss": 0.3028,
586
+ "step": 7300
587
+ },
588
+ {
589
+ "epoch": 274.07,
590
+ "learning_rate": 9.276315789473685e-06,
591
+ "loss": 0.2989,
592
+ "step": 7400
593
+ },
594
+ {
595
+ "epoch": 277.76,
596
+ "learning_rate": 7.960526315789474e-06,
597
+ "loss": 0.2941,
598
+ "step": 7500
599
+ },
600
+ {
601
+ "epoch": 277.76,
602
+ "eval_cer": 0.07941858015588793,
603
+ "eval_loss": 0.5565056204795837,
604
+ "eval_runtime": 34.8538,
605
+ "eval_samples_per_second": 23.154,
606
+ "eval_steps_per_second": 2.898,
607
+ "eval_wer": 0.29234721967333455,
608
+ "step": 7500
609
+ },
610
+ {
611
+ "epoch": 281.47,
612
+ "learning_rate": 6.644736842105263e-06,
613
+ "loss": 0.2983,
614
+ "step": 7600
615
+ },
616
+ {
617
+ "epoch": 285.18,
618
+ "learning_rate": 5.328947368421053e-06,
619
+ "loss": 0.2944,
620
+ "step": 7700
621
+ },
622
+ {
623
+ "epoch": 288.87,
624
+ "learning_rate": 4.013157894736842e-06,
625
+ "loss": 0.2941,
626
+ "step": 7800
627
+ },
628
+ {
629
+ "epoch": 292.58,
630
+ "learning_rate": 2.6973684210526316e-06,
631
+ "loss": 0.2947,
632
+ "step": 7900
633
+ },
634
+ {
635
+ "epoch": 296.29,
636
+ "learning_rate": 1.381578947368421e-06,
637
+ "loss": 0.2945,
638
+ "step": 8000
639
+ },
640
+ {
641
+ "epoch": 296.29,
642
+ "eval_cer": 0.07894459658731831,
643
+ "eval_loss": 0.549480140209198,
644
+ "eval_runtime": 34.1061,
645
+ "eval_samples_per_second": 23.661,
646
+ "eval_steps_per_second": 2.961,
647
+ "eval_wer": 0.2951000183519912,
648
+ "step": 8000
649
+ },
650
+ {
651
+ "epoch": 299.98,
652
+ "learning_rate": 6.578947368421053e-08,
653
+ "loss": 0.2913,
654
+ "step": 8100
655
+ },
656
+ {
657
+ "epoch": 299.98,
658
+ "step": 8100,
659
+ "total_flos": 1.5339574491038086e+20,
660
+ "train_loss": 0.7936776961809323,
661
+ "train_runtime": 54993.0112,
662
+ "train_samples_per_second": 19.142,
663
+ "train_steps_per_second": 0.147
664
+ }
665
+ ],
666
+ "max_steps": 8100,
667
+ "num_train_epochs": 300,
668
+ "total_flos": 1.5339574491038086e+20,
669
+ "trial_name": null,
670
+ "trial_params": null
671
+ }