akrishnan's picture
End of training
ac65a53 verified
metadata
license: mit
base_model: facebook/w2v-bert-2.0
tags:
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: malayalam_combined_Extempore
    results: []

Visualize in Weights & Biases

malayalam_combined_Extempore

This model is a fine-tuned version of facebook/w2v-bert-2.0 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4866
  • Wer: 0.4837

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.8139 0.9794 500 0.8389 0.6821
0.6539 1.9589 1000 0.6815 0.6041
0.5383 2.9383 1500 0.5827 0.5705
0.4772 3.9177 2000 0.5398 0.5548
0.4351 4.8972 2500 0.5342 0.5407
0.3866 5.8766 3000 0.5411 0.5174
0.3567 6.8560 3500 0.5063 0.5085
0.3047 7.8355 4000 0.4886 0.4986
0.2879 8.8149 4500 0.4878 0.4884
0.2648 9.7943 5000 0.4866 0.4837

Framework versions

  • Transformers 4.43.0.dev0
  • Pytorch 1.14.0a0+44dac51
  • Datasets 2.16.1
  • Tokenizers 0.19.1