greenw0lf
/

wav2vec2-large-xls-r-1b-frisian-cv-8-1h

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-large-xls-r-1b-frisian-cv-8-1h / README.md

greenw0lf's picture

Update README.md

a2804d5 about 1 year ago

|

history blame contribute delete

3.85 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	datasets:
	- common_voice_8_0
	metrics:
	- wer
	model-index:
	- name: wav2vec2-large-xls-r-1b-frisian-cv-8-1h
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: common_voice_8_0
	type: common_voice_8_0
	config: fy-NL
	split: validation
	args: fy-NL
	metrics:
	- name: Wer
	type: wer
	value: 0.23732323953720896
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: common_voice_8_0
	type: common_voice_8_0
	config: fy-NL
	split: test
	args: fy-NL
	metrics:
	- name: Wer
	type: wer
	value: 0.25404682757623936
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# wav2vec2-large-xls-r-1b-frisian-cv-8-1h

	This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on the common_voice_8_0 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4120
	- Wer: 0.2373

	And on the test set:
	- Wer: 0.2540

	## Model description

	This model has been developed for my Master's thesis in "Voice Technology" at Rijksuniversiteit Groningen - Campus Fryslân. It corresponds to experiment 4 where
	I use as training set 1 hour of Frisian speech randomly selected from all validated data except the test and evaluation sets.

	## Intended uses & limitations

	The intended use is for recognizing Frisian speech.

	Limitations include no LM rescoring and using version 8.0 of Common Voice instead of 13.0.

	## Training and evaluation data

	The evaluation split used is the one available in the Common Voice 8.0 Frisian subset. The train split is 1 hour of Frisian randomly selected from validated data except for the recordings from test and evaluation splits.

	## Training procedure

	The script used for training this model can be found in this GitHub repository: [link](https://github.com/greenw0lf/MSc-VT-Thesis/).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 6e-05
	- train_batch_size: 32
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 80
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| 6.2987 \| 4.35 \| 100 \| 3.0210 \| 1.0 \|
	\| 3.1424 \| 8.7 \| 200 \| 2.9611 \| 1.0 \|
	\| 2.6299 \| 13.04 \| 300 \| 0.9929 \| 0.8377 \|
	\| 1.3134 \| 17.39 \| 400 \| 0.5679 \| 0.5264 \|
	\| 0.9747 \| 21.74 \| 500 \| 0.4516 \| 0.3764 \|
	\| 0.8755 \| 26.09 \| 600 \| 0.4515 \| 0.3403 \|
	\| 0.7227 \| 30.43 \| 700 \| 0.4169 \| 0.3211 \|
	\| 0.6634 \| 34.78 \| 800 \| 0.4159 \| 0.2962 \|
	\| 0.5568 \| 39.13 \| 900 \| 0.4081 \| 0.2795 \|
	\| 0.7943 \| 43.48 \| 1000 \| 0.4090 \| 0.2709 \|
	\| 0.5537 \| 47.83 \| 1100 \| 0.4239 \| 0.2649 \|
	\| 0.5596 \| 52.17 \| 1200 \| 0.4029 \| 0.2561 \|
	\| 0.5523 \| 56.52 \| 1300 \| 0.4073 \| 0.2524 \|
	\| 0.4579 \| 60.87 \| 1400 \| 0.4098 \| 0.2470 \|
	\| 0.6477 \| 65.22 \| 1500 \| 0.4099 \| 0.2446 \|
	\| 0.4957 \| 69.57 \| 1600 \| 0.4167 \| 0.2475 \|
	\| 0.3246 \| 73.91 \| 1700 \| 0.4146 \| 0.2389 \|
	\| 0.3937 \| 78.26 \| 1800 \| 0.4120 \| 0.2373 \|


	### Framework versions

	- Transformers 4.28.1
	- Pytorch 2.0.0+cu117
	- Datasets 2.11.0
	- Tokenizers 0.13.3