Edit model card

SpeechT5 STT Wav2Vec2

This model is a fine-tuned version of facebook/wav2vec2-base-960h on the Lj-Speech dataset. It achieves the following results on the evaluation set:

  • Loss: 509.3571

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
19536.3975 0.1376 50 8125.4639
2673.9081 0.2751 100 909.8571
1958.4278 0.4127 150 544.6085
1268.7548 0.5502 200 555.2729
1504.7081 0.6878 250 520.7637
1322.1669 0.8253 300 572.5987
1331.9734 0.9629 350 514.8672
1149.1491 1.1004 400 525.9183
1063.02 1.2380 450 511.6159
1063.2695 1.3755 500 521.9377
1037.6037 1.5131 550 511.7293
1065.5638 1.6506 600 510.2425
1025.7576 1.7882 650 506.2704
1132.412 1.9257 700 525.5427
1033.8723 2.0633 750 506.9381
1027.0328 2.2008 800 513.5829
1024.9632 2.3384 850 518.4105
1023.1637 2.4759 900 515.6079
1006.7498 2.6135 950 513.5686
1026.8645 2.7510 1000 507.8027
1026.9354 2.8886 1050 509.3571

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
94.4M params
Tensor type
F32
·
Inference API
or
This model can be loaded on Inference API (serverless).

Finetuned from