Fixes README table header

328c1e9 about 1 year ago

6.85 kB

	---
	language:
	- eo
	tags:
	- automatic-speech-recognition
	- mozilla-foundation/common_voice_13_0
	- generated_from_trainer
	metrics:
	- wer
	model-index:
	- name: mms-common_voice_13_0-eo-1
	results: []
	---

	# mms-common_voice_13_0-eo-1, an Esperanto speech recognizer

	This model is a fine-tuned version of [patrickvonplaten/mms-300m](https://huggingface.co/patrickvonplaten/mms-300m) on the the [mozilla-foundation/common_voice_13_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0) Esperanto dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2257
	- Cer: 0.0209
	- Wer: 0.0678

	While the training loss is lower, this model does not perform significantly better than [xekri/wav2vec2-common_voice_13_0-eo-3](https://huggingface.co/xekri/wav2vec2-common_voice_13_0-eo-3).

	The first 10 samples in the test set:

	\| Actual<br>Predicted \| CER \|
	\|:--------------------\|:----\|
	\| `la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo`<br>`la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo` \| 0.0 \|
	\| `en la sekva jaro li ricevis premion`<br>`en la sekva jaro li ricevis premion` \| 0.0 \|
	\| `ŝi studis historion ĉe la universitato de brita kolumbio`<br>`ŝi studis historion ĉe la universitato de brita kolumbio` \| 0.0 \|
	\| `larĝaj ŝtupoj kuras al la fasado`<br>`larĝaj ŝtupoj kuras al la fasado` \| 0.0 \|
	\| `la municipo ĝuas duan epokon de etendo kaj disvolviĝo`<br>`la municipo ĝuas duan epokon de etendo kaj disvolviĝo` \| 0.0 \|
	\| `li estis ankaŭ katedrestro kaj dekano`<br>`li estis ankaŭ katedresto kaj dekano` \| 0.02702702702702703 \|
	\| `librovendejo apartenas al la muzeo`<br>`librovendejo apartenas al la muzeo` \| 0.0 \|
	\| `ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵaro de arbaroj`<br>`ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵo de arbaroj` \| 0.02702702702702703 \|
	\| `unue ili estas ruĝaj poste brunaj`<br>`unue ili estas ruĝaj poste brunaj` \| 0.0 \|
	\| `la loĝantaro laboras en la proksima ĉefurbo`<br>`la loĝantaro laboras en la proksima ĉefurbo` \| 0.0 \|

	## Model description

	See [patrickvonplaten/mms-300m](https://huggingface.co/patrickvonplaten/mms-300m), or equivalently, [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53), as it seems to me that the only difference is that the speech front-end was trained with more languages and data in the mms-300m checkpoint.

	## Intended uses & limitations

	Speech recognition for Esperanto. The base model was pretrained and finetuned on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16KHz.

	## Training and evaluation data

	The training split was set to `train[:15000]` while the eval split was set to `validation[:1500]`.

	## Training procedure

	The same as [xekri/wav2vec2-common_voice_13_0-eo-3](https://huggingface.co/xekri/wav2vec2-common_voice_13_0-eo-3).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- layerdrop: 0.1
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 100
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Cer \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:-----:\|:------:\|:---------------:\|:------:\|
	\| 2.3129 \| 2.13 \| 1000 \| 0.0580 \| 0.5042 \| 0.2703 \|
	\| 0.2251 \| 4.27 \| 2000 \| 0.0295 \| 0.1782 \| 0.1198 \|
	\| 0.1462 \| 6.4 \| 3000 \| 0.0265 \| 0.1635 \| 0.1019 \|
	\| 0.1162 \| 8.53 \| 4000 \| 0.0248 \| 0.1619 \| 0.0931 \|
	\| 0.0988 \| 10.67 \| 5000 \| 0.0249 \| 0.1654 \| 0.0940 \|
	\| 0.0904 \| 12.8 \| 6000 \| 0.0242 \| 0.1702 \| 0.0845 \|
	\| 0.0813 \| 14.93 \| 7000 \| 0.0239 \| 0.1658 \| 0.0846 \|
	\| 0.074 \| 17.09 \| 8000 \| 0.0240 \| 0.1763 \| 0.0793 \|
	\| 0.0692 \| 19.22 \| 9000 \| 0.0243 \| 0.1768 \| 0.0835 \|
	\| 0.0652 \| 21.36 \| 10000 \| 0.0237 \| 0.1812 \| 0.0797 \|
	\| 0.0593 \| 23.5 \| 11000 \| 0.0221 \| 0.1810 \| 0.0750 \|
	\| 0.0547 \| 25.63 \| 12000 \| 0.0233 \| 0.1835 \| 0.0794 \|
	\| 0.0514 \| 27.76 \| 13000 \| 0.0224 \| 0.1828 \| 0.0761 \|
	\| 0.0488 \| 29.9 \| 14000 \| 0.0224 \| 0.1844 \| 0.0766 \|
	\| 0.0478 \| 32.03 \| 15000 \| 0.0226 \| 0.1910 \| 0.0769 \|
	\| 0.0459 \| 34.16 \| 16000 \| 0.0239 \| 0.1965 \| 0.0831 \|
	\| 0.0429 \| 36.3 \| 17000 \| 0.0220 \| 0.2000 \| 0.0760 \|
	\| 0.0443 \| 38.43 \| 18000 \| 0.0228 \| 0.2039 \| 0.0774 \|
	\| 0.0398 \| 40.56 \| 19000 \| 0.0219 \| 0.1981 \| 0.0755 \|
	\| 0.0408 \| 42.7 \| 20000 \| 0.0239 \| 0.2053 \| 0.0776 \|
	\| 0.0406 \| 44.83 \| 21000 \| 0.0221 \| 0.2050 \| 0.0740 \|
	\| 0.0383 \| 46.96 \| 22000 \| 0.0224 \| 0.2128 \| 0.0733 \|
	\| 0.0379 \| 49.1 \| 23000 \| 0.0220 \| 0.2110 \| 0.0731 \|
	\| 0.0369 \| 51.23 \| 24000 \| 0.0220 \| 0.2145 \| 0.0745 \|
	\| 0.0341 \| 53.36 \| 25000 \| 0.0222 \| 0.2146 \| 0.0725 \|
	\| 0.0322 \| 55.5 \| 26000 \| 0.0216 \| 0.2130 \| 0.0710 \|
	\| 0.0316 \| 57.63 \| 27000 \| 0.0222 \| 0.2134 \| 0.0716 \|
	\| 0.0324 \| 59.76 \| 28000 \| 0.0222 \| 0.2172 \| 0.0731 \|
	\| 0.0315 \| 61.9 \| 29000 \| 0.0228 \| 0.2207 \| 0.0745 \|
	\| 0.0294 \| 64.03 \| 30000 \| 0.0218 \| 0.2183 \| 0.0717 \|
	\| 0.028 \| 66.16 \| 31000 \| 0.0214 \| 0.2185 \| 0.0696 \|
	\| 0.0263 \| 68.3 \| 32000 \| 0.0215 \| 0.2167 \| 0.0696 \|
	\| 0.0299 \| 70.43 \| 33000 \| 0.0217 \| 0.2201 \| 0.0709 \|
	\| 0.0273 \| 72.56 \| 34000 \| 0.0222 \| 0.2164 \| 0.0724 \|
	\| 0.0269 \| 74.7 \| 35000 \| 0.0220 \| 0.2240 \| 0.0693 \|
	\| 0.0264 \| 76.92 \| 36000 \| 0.2220 \| 0.0218 \| 0.0704 \|
	\| 0.0257 \| 79.05 \| 37000 \| 0.2229 \| 0.0217 \| 0.0688 \|
	\| 0.0251 \| 81.19 \| 38000 \| 0.2263 \| 0.0215 \| 0.0694 \|
	\| 0.0245 \| 83.32 \| 39000 \| 0.2253 \| 0.0210 \| 0.0673 \|
	\| 0.0243 \| 85.45 \| 40000 \| 0.2264 \| 0.0215 \| 0.0692 \|
	\| 0.0236 \| 87.59 \| 41000 \| 0.2261 \| 0.0217 \| 0.0689 \|
	\| 0.0225 \| 89.72 \| 42000 \| 0.2265 \| 0.0212 \| 0.0680 \|
	\| 0.023 \| 91.85 \| 43000 \| 0.2265 \| 0.0210 \| 0.0674 \|
	\| 0.0217 \| 93.99 \| 44000 \| 0.2265 \| 0.0209 \| 0.0677 \|
	\| 0.022 \| 96.12 \| 45000 \| 0.2254 \| 0.0211 \| 0.0685 \|
	\| 0.0219 \| 98.25 \| 46000 \| 0.2262 \| 0.0208 \| 0.0672 \|


	### Framework versions

	- Transformers 4.29.1
	- Pytorch 2.0.1+cu118
	- Datasets 2.12.0
	- Tokenizers 0.13.3

	---
	language:
	- eo
	tags:
	- automatic-speech-recognition
	- mozilla-foundation/common_voice_13_0
	- generated_from_trainer
	metrics:
	- wer
	model-index:
	- name: mms-common_voice_13_0-eo-1
	results: []
	---

	# mms-common_voice_13_0-eo-1, an Esperanto speech recognizer

	This model is a fine-tuned version of [patrickvonplaten/mms-300m](https://huggingface.co/patrickvonplaten/mms-300m) on the the [mozilla-foundation/common_voice_13_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0) Esperanto dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2257
	- Cer: 0.0209
	- Wer: 0.0678

	While the training loss is lower, this model does not perform significantly better than [xekri/wav2vec2-common_voice_13_0-eo-3](https://huggingface.co/xekri/wav2vec2-common_voice_13_0-eo-3).

	The first 10 samples in the test set:

	\| Actual<br>Predicted \| CER \|
	\|:--------------------\|:----\|
	\| `la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo`<br>`la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo` \| 0.0 \|
	\| `en la sekva jaro li ricevis premion`<br>`en la sekva jaro li ricevis premion` \| 0.0 \|
	\| `ŝi studis historion ĉe la universitato de brita kolumbio`<br>`ŝi studis historion ĉe la universitato de brita kolumbio` \| 0.0 \|
	\| `larĝaj ŝtupoj kuras al la fasado`<br>`larĝaj ŝtupoj kuras al la fasado` \| 0.0 \|
	\| `la municipo ĝuas duan epokon de etendo kaj disvolviĝo`<br>`la municipo ĝuas duan epokon de etendo kaj disvolviĝo` \| 0.0 \|
	\| `li estis ankaŭ katedrestro kaj dekano`<br>`li estis ankaŭ katedresto kaj dekano` \| 0.02702702702702703 \|
	\| `librovendejo apartenas al la muzeo`<br>`librovendejo apartenas al la muzeo` \| 0.0 \|
	\| `ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵaro de arbaroj`<br>`ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵo de arbaroj` \| 0.02702702702702703 \|
	\| `unue ili estas ruĝaj poste brunaj`<br>`unue ili estas ruĝaj poste brunaj` \| 0.0 \|
	\| `la loĝantaro laboras en la proksima ĉefurbo`<br>`la loĝantaro laboras en la proksima ĉefurbo` \| 0.0 \|

	## Model description

	See [patrickvonplaten/mms-300m](https://huggingface.co/patrickvonplaten/mms-300m), or equivalently, [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53), as it seems to me that the only difference is that the speech front-end was trained with more languages and data in the mms-300m checkpoint.

	## Intended uses & limitations

	Speech recognition for Esperanto. The base model was pretrained and finetuned on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16KHz.

	## Training and evaluation data

	The training split was set to `train[:15000]` while the eval split was set to `validation[:1500]`.

	## Training procedure

	The same as [xekri/wav2vec2-common_voice_13_0-eo-3](https://huggingface.co/xekri/wav2vec2-common_voice_13_0-eo-3).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- layerdrop: 0.1
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 100
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Cer \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:-----:\|:------:\|:---------------:\|:------:\|
	\| 2.3129 \| 2.13 \| 1000 \| 0.0580 \| 0.5042 \| 0.2703 \|
	\| 0.2251 \| 4.27 \| 2000 \| 0.0295 \| 0.1782 \| 0.1198 \|
	\| 0.1462 \| 6.4 \| 3000 \| 0.0265 \| 0.1635 \| 0.1019 \|
	\| 0.1162 \| 8.53 \| 4000 \| 0.0248 \| 0.1619 \| 0.0931 \|
	\| 0.0988 \| 10.67 \| 5000 \| 0.0249 \| 0.1654 \| 0.0940 \|
	\| 0.0904 \| 12.8 \| 6000 \| 0.0242 \| 0.1702 \| 0.0845 \|
	\| 0.0813 \| 14.93 \| 7000 \| 0.0239 \| 0.1658 \| 0.0846 \|
	\| 0.074 \| 17.09 \| 8000 \| 0.0240 \| 0.1763 \| 0.0793 \|
	\| 0.0692 \| 19.22 \| 9000 \| 0.0243 \| 0.1768 \| 0.0835 \|
	\| 0.0652 \| 21.36 \| 10000 \| 0.0237 \| 0.1812 \| 0.0797 \|
	\| 0.0593 \| 23.5 \| 11000 \| 0.0221 \| 0.1810 \| 0.0750 \|
	\| 0.0547 \| 25.63 \| 12000 \| 0.0233 \| 0.1835 \| 0.0794 \|
	\| 0.0514 \| 27.76 \| 13000 \| 0.0224 \| 0.1828 \| 0.0761 \|
	\| 0.0488 \| 29.9 \| 14000 \| 0.0224 \| 0.1844 \| 0.0766 \|
	\| 0.0478 \| 32.03 \| 15000 \| 0.0226 \| 0.1910 \| 0.0769 \|
	\| 0.0459 \| 34.16 \| 16000 \| 0.0239 \| 0.1965 \| 0.0831 \|
	\| 0.0429 \| 36.3 \| 17000 \| 0.0220 \| 0.2000 \| 0.0760 \|
	\| 0.0443 \| 38.43 \| 18000 \| 0.0228 \| 0.2039 \| 0.0774 \|
	\| 0.0398 \| 40.56 \| 19000 \| 0.0219 \| 0.1981 \| 0.0755 \|
	\| 0.0408 \| 42.7 \| 20000 \| 0.0239 \| 0.2053 \| 0.0776 \|
	\| 0.0406 \| 44.83 \| 21000 \| 0.0221 \| 0.2050 \| 0.0740 \|
	\| 0.0383 \| 46.96 \| 22000 \| 0.0224 \| 0.2128 \| 0.0733 \|
	\| 0.0379 \| 49.1 \| 23000 \| 0.0220 \| 0.2110 \| 0.0731 \|
	\| 0.0369 \| 51.23 \| 24000 \| 0.0220 \| 0.2145 \| 0.0745 \|
	\| 0.0341 \| 53.36 \| 25000 \| 0.0222 \| 0.2146 \| 0.0725 \|
	\| 0.0322 \| 55.5 \| 26000 \| 0.0216 \| 0.2130 \| 0.0710 \|
	\| 0.0316 \| 57.63 \| 27000 \| 0.0222 \| 0.2134 \| 0.0716 \|
	\| 0.0324 \| 59.76 \| 28000 \| 0.0222 \| 0.2172 \| 0.0731 \|
	\| 0.0315 \| 61.9 \| 29000 \| 0.0228 \| 0.2207 \| 0.0745 \|
	\| 0.0294 \| 64.03 \| 30000 \| 0.0218 \| 0.2183 \| 0.0717 \|
	\| 0.028 \| 66.16 \| 31000 \| 0.0214 \| 0.2185 \| 0.0696 \|
	\| 0.0263 \| 68.3 \| 32000 \| 0.0215 \| 0.2167 \| 0.0696 \|
	\| 0.0299 \| 70.43 \| 33000 \| 0.0217 \| 0.2201 \| 0.0709 \|
	\| 0.0273 \| 72.56 \| 34000 \| 0.0222 \| 0.2164 \| 0.0724 \|
	\| 0.0269 \| 74.7 \| 35000 \| 0.0220 \| 0.2240 \| 0.0693 \|
	\| 0.0264 \| 76.92 \| 36000 \| 0.2220 \| 0.0218 \| 0.0704 \|
	\| 0.0257 \| 79.05 \| 37000 \| 0.2229 \| 0.0217 \| 0.0688 \|
	\| 0.0251 \| 81.19 \| 38000 \| 0.2263 \| 0.0215 \| 0.0694 \|
	\| 0.0245 \| 83.32 \| 39000 \| 0.2253 \| 0.0210 \| 0.0673 \|
	\| 0.0243 \| 85.45 \| 40000 \| 0.2264 \| 0.0215 \| 0.0692 \|
	\| 0.0236 \| 87.59 \| 41000 \| 0.2261 \| 0.0217 \| 0.0689 \|
	\| 0.0225 \| 89.72 \| 42000 \| 0.2265 \| 0.0212 \| 0.0680 \|
	\| 0.023 \| 91.85 \| 43000 \| 0.2265 \| 0.0210 \| 0.0674 \|
	\| 0.0217 \| 93.99 \| 44000 \| 0.2265 \| 0.0209 \| 0.0677 \|
	\| 0.022 \| 96.12 \| 45000 \| 0.2254 \| 0.0211 \| 0.0685 \|
	\| 0.0219 \| 98.25 \| 46000 \| 0.2262 \| 0.0208 \| 0.0672 \|


	### Framework versions

	- Transformers 4.29.1
	- Pytorch 2.0.1+cu118
	- Datasets 2.12.0
	- Tokenizers 0.13.3