|
--- |
|
base_model: facebook/w2v-bert-2.0 |
|
datasets: |
|
- common_voice_10_0 |
|
metrics: |
|
- wer |
|
model-index: |
|
- name: w2v-bert-2.0-uk |
|
results: |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: common_voice_10_0 |
|
type: common_voice_10_0 |
|
config: uk |
|
split: test |
|
args: uk |
|
metrics: |
|
- name: Wer |
|
type: wer |
|
value: 0.0655 |
|
--- |
|
|
|
# wav2vec2-bert-uk |
|
|
|
- Join our Speech Recognition Group in Telegram: https://t.me/speech_recognition_uk |
|
- Join our **Discord server** - https://discord.gg/nmUCXz55 - where we're talking about AI |
|
|
|
Quality: |
|
|
|
- AM: |
|
- WER: 0.0727 |
|
- CER: 0.0151 |
|
- Accuracy: 92.73% |
|
- AM + LM: |
|
- WER: 0.0655 |
|
- CER: 0.0139 |
|
- Accuracy: 93.45% |
|
|
|
This model was trained with the following hparams with 2 RTX A4000: |
|
|
|
``` |
|
torchrun --standalone --nnodes=1 --nproc-per-node=2 ../train_w2v2_bert.py \ |
|
--custom_set ~/cv10/train.csv \ |
|
--custom_set_eval ~/cv10/test.csv \ |
|
--num_train_epochs 15 \ |
|
--tokenize_config . \ |
|
--w2v2_bert_model facebook/w2v-bert-2.0 \ |
|
--batch 4 \ |
|
--num_proc 5 \ |
|
--grad_accum 1 \ |
|
--learning_rate 3e-5 \ |
|
--logging_steps 20 \ |
|
--eval_step 500 \ |
|
--group_by_length \ |
|
--attention_dropout 0.0 \ |
|
--activation_dropout 0.05 \ |
|
--feat_proj_dropout 0.05 \ |
|
--feat_quantizer_dropout 0.0 \ |
|
--hidden_dropout 0.05 \ |
|
--layerdrop 0.0 \ |
|
--final_dropout 0.0 \ |
|
--mask_time_prob 0.0 \ |
|
--mask_time_length 10 \ |
|
--mask_feature_prob 0.0 \ |
|
--mask_feature_length 10 |
|
``` |
|
|