xekri commited on
Commit
f89747b
1 Parent(s): 767509b

Updates README

Browse files
Files changed (1) hide show
  1. README.md +26 -8
README.md CHANGED
@@ -12,31 +12,48 @@ model-index:
12
  results: []
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
- # mms-common_voice_13_0-eo-1
19
-
20
- This model is a fine-tuned version of [patrickvonplaten/mms-300m](https://huggingface.co/patrickvonplaten/mms-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_13_0 - EO dataset.
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.2257
23
  - Cer: 0.0209
24
  - Wer: 0.0678
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ## Model description
27
 
28
- More information needed
29
 
30
  ## Intended uses & limitations
31
 
32
- More information needed
33
 
34
  ## Training and evaluation data
35
 
36
- More information needed
37
 
38
  ## Training procedure
39
 
 
 
40
  ### Training hyperparameters
41
 
42
  The following hyperparameters were used during training:
@@ -47,6 +64,7 @@ The following hyperparameters were used during training:
47
  - gradient_accumulation_steps: 4
48
  - total_train_batch_size: 32
49
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 
50
  - lr_scheduler_type: linear
51
  - lr_scheduler_warmup_steps: 500
52
  - num_epochs: 100
 
12
  results: []
13
  ---
14
 
15
+ # mms-common_voice_13_0-eo-1, an Esperanto speech recognizer
 
16
 
17
+ This model is a fine-tuned version of [patrickvonplaten/mms-300m](https://huggingface.co/patrickvonplaten/mms-300m) on the the [mozilla-foundation/common_voice_13_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0) Esperanto dataset.
 
 
18
  It achieves the following results on the evaluation set:
19
  - Loss: 0.2257
20
  - Cer: 0.0209
21
  - Wer: 0.0678
22
 
23
+ While the training loss is lower, this model does not perform significantly better than [xekri/wav2vec2-common_voice_13_0-eo-3](https://huggingface.co/xekri/wav2vec2-common_voice_13_0-eo-3).
24
+
25
+ The first 10 samples in the test set:
26
+
27
+ | Actual | CER |
28
+ | Predicted | |
29
+ |:---------:|:---:|
30
+ | `la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo`<br>`la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo` | 0.0 |
31
+ | `en la sekva jaro li ricevis premion`<br>`en la sekva jaro li ricevis premion` | 0.0 |
32
+ | `ŝi studis historion ĉe la universitato de brita kolumbio`<br>`ŝi studis historion ĉe la universitato de brita kolumbio` | 0.0 |
33
+ | `larĝaj ŝtupoj kuras al la fasado`<br>`larĝaj ŝtupoj kuras al la fasado` | 0.0 |
34
+ | `la municipo ĝuas duan epokon de etendo kaj disvolviĝo`<br>`la municipo ĝuas duan epokon de etendo kaj disvolviĝo` | 0.0 |
35
+ | `li estis ankaŭ katedrestro kaj dekano`<br>`li estis ankaŭ katedresto kaj dekano` | 0.02702702702702703 |
36
+ | `librovendejo apartenas al la muzeo`<br>`librovendejo apartenas al la muzeo` | 0.0 |
37
+ | `ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵaro de arbaroj`<br>`ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵo de arbaroj` | 0.02702702702702703 |
38
+ | `unue ili estas ruĝaj poste brunaj`<br>`unue ili estas ruĝaj poste brunaj` | 0.0 |
39
+ | `la loĝantaro laboras en la proksima ĉefurbo`<br>`la loĝantaro laboras en la proksima ĉefurbo` | 0.0 |
40
+
41
  ## Model description
42
 
43
+ See [patrickvonplaten/mms-300m](https://huggingface.co/patrickvonplaten/mms-300m), or equivalently, [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53), as it seems to me that the only difference is that the speech front-end was trained with more languages and data in the mms-300m checkpoint.
44
 
45
  ## Intended uses & limitations
46
 
47
+ Speech recognition for Esperanto. The base model was pretrained and finetuned on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16KHz.
48
 
49
  ## Training and evaluation data
50
 
51
+ The training split was set to `train[:15000]` while the eval split was set to `validation[:1500]`.
52
 
53
  ## Training procedure
54
 
55
+ The same as [xekri/wav2vec2-common_voice_13_0-eo-3](https://huggingface.co/xekri/wav2vec2-common_voice_13_0-eo-3).
56
+
57
  ### Training hyperparameters
58
 
59
  The following hyperparameters were used during training:
 
64
  - gradient_accumulation_steps: 4
65
  - total_train_batch_size: 32
66
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
67
+ - layerdrop: 0.1
68
  - lr_scheduler_type: linear
69
  - lr_scheduler_warmup_steps: 500
70
  - num_epochs: 100