Edit model card

wav2vec-bert-2.0-even-pakendorf-0406-1347

This model is a fine-tuned version of facebook/w2v-bert-2.0 on the audiofolder dataset. It achieves the following results on the evaluation set:

  • Cer: 0.2128
  • Loss: inf
  • Wer: 0.5969

Model description

How to use:

from transformers import AutoModelForCTC, Wav2Vec2BertProcessor

model = AutoModelForCTC.from_pretrained("tbkazakova/wav2vec-bert-2.0-even-pakendorf")
processor = Wav2Vec2BertProcessor.from_pretrained("tbkazakova/wav2vec-bert-2.0-even-pakendorf")

data, sampling_rate = librosa.load('audio.wav')
librosa.resample(data, orig_sr=sampling_rate, target_sr=16000)
logits = model(torch.tensor(processor(data,
                                      sampling_rate=16000).input_features[0]).unsqueeze(0)).logits

pred_ids = torch.argmax(logits, dim=-1)[0]
print(processor.decode(pred_ids))

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Cer Validation Loss Wer
4.5767 0.5051 200 0.4932 inf 0.9973
1.8775 1.0101 400 0.3211 inf 0.8494
1.6006 1.5152 600 0.3017 inf 0.8040
1.4476 2.0202 800 0.2896 inf 0.7534
1.2213 2.5253 1000 0.2610 inf 0.7080
1.1485 3.0303 1200 0.2684 inf 0.6800
0.9554 3.5354 1400 0.2459 inf 0.6732
0.9379 4.0404 1600 0.2275 inf 0.6251
0.7644 4.5455 1800 0.2235 inf 0.6224
0.7891 5.0505 2000 0.2180 inf 0.6053
0.633 5.5556 2200 0.2130 inf 0.5996
0.6197 6.0606 2400 0.2126 inf 0.6032
0.5212 6.5657 2600 0.2196 inf 0.6019
0.4881 7.0707 2800 0.2125 inf 0.5894
0.4 7.5758 3000 0.2066 inf 0.5852
0.4008 8.0808 3200 0.2076 inf 0.5790
0.3304 8.5859 3400 0.2096 inf 0.5884
0.3446 9.0909 3600 0.2124 inf 0.5983
0.3237 9.5960 3800 0.2128 inf 0.5969

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
18
Safetensors
Model size
606M params
Tensor type
F32
·

Finetuned from

Evaluation results