--- license: cc-by-nc-4.0 tags: - generated_from_trainer base_model: nguyenvulebinh/wav2vec2-base-vietnamese-250h datasets: - common_voice_17_0 metrics: - wer model-index: - name: wav2vec2-common-voice-17_0_vi results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: common_voice_17_0 type: common_voice_17_0 config: vi split: None args: vi metrics: - type: wer value: 0.43487928843710294 name: Wer --- # wav2vec2-common-voice-17_0_vi This model is a fine-tuned version of [nguyenvulebinh/wav2vec2-base-vietnamese-250h](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h) on the common_voice_17_0 dataset. It achieves the following results on the evaluation set: - Loss: 0.7992 - Wer: 0.4349 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 1000 - num_epochs: 30 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-------:|:----:|:---------------:|:------:| | 0.261 | 4.3103 | 500 | 0.4182 | 0.3492 | | 0.2061 | 8.6207 | 1000 | 0.5416 | 0.4044 | | 0.1883 | 12.9310 | 1500 | 0.6796 | 0.4304 | | 0.1336 | 17.2414 | 2000 | 0.8089 | 0.4378 | | 0.1257 | 21.5517 | 2500 | 0.8244 | 0.4426 | | 0.098 | 25.8621 | 3000 | 0.7992 | 0.4349 | ### Framework versions - Transformers 4.40.0 - Pytorch 2.2.1+cu121 - Datasets 2.19.0 - Tokenizers 0.19.1