--- base_model: facebook/w2v-bert-2.0 datasets: - common_voice_10_0 metrics: - wer model-index: - name: w2v-bert-2.0-uk results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: common_voice_10_0 type: common_voice_10_0 config: uk split: test args: uk metrics: - name: Wer type: wer value: 0.0655 --- # wav2vec2-bert-uk - Join our Speech Recognition Group in Telegram: https://t.me/speech_recognition_uk - Join our **Discord server** - https://discord.gg/nmUCXz55 - where we're talking about AI Quality: - AM: - WER: 0.0727 - CER: 0.0151 - Accuracy: 92.73% - AM + LM: - WER: 0.0655 - CER: 0.0139 - Accuracy: 93.45% This model was trained with the following hparams with 2 RTX A4000: ``` torchrun --standalone --nnodes=1 --nproc-per-node=2 ../train_w2v2_bert.py \ --custom_set ~/cv10/train.csv \ --custom_set_eval ~/cv10/test.csv \ --num_train_epochs 15 \ --tokenize_config . \ --w2v2_bert_model facebook/w2v-bert-2.0 \ --batch 4 \ --num_proc 5 \ --grad_accum 1 \ --learning_rate 3e-5 \ --logging_steps 20 \ --eval_step 500 \ --group_by_length \ --attention_dropout 0.0 \ --activation_dropout 0.05 \ --feat_proj_dropout 0.05 \ --feat_quantizer_dropout 0.0 \ --hidden_dropout 0.05 \ --layerdrop 0.0 \ --final_dropout 0.0 \ --mask_time_prob 0.0 \ --mask_time_length 10 \ --mask_feature_prob 0.0 \ --mask_feature_length 10 ```