alvanli
Added rest of the files for 10.1 CER
97aa366
metadata
language:
  - zh
license: apache-2.0
tags:
  - whisper-event
  - generated_from_trainer
datasets:
  - mozilla-foundation/common_voice_11_0
model-index:
  - name: Whisper Small zh-HK - Alvin
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: mozilla-foundation/common_voice_11_0 zh-HK
          type: mozilla-foundation/common_voice_11_0
          config: zh-HK
          split: test
          args: zh-HK
        metrics:
          - name: Cer
            type: cer
            value: 10.11

Whisper Small zh-HK - Alvin

This model is a fine-tuned version of openai/whisper-small on the Common Voice 11.0 dataset. This version has a lower CER (by 1%) compared to the previous one.

Training and evaluation data

For training, three datasets were used:

  • Common Voice 11 Canto Train Set
  • CantoMap: Winterstein, Grégoire, Tang, Carmen and Lai, Regine (2020) "CantoMap: a Hong Kong Cantonese MapTask Corpus", in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille: European Language Resources Association, p. 2899-2906.
  • Cantonse-ASR: Yu, Tiezheng, Frieske, Rita, Xu, Peng, Cahyawijaya, Samuel, Yiu, Cheuk Tung, Lovenia, Holy, Dai, Wenliang, Barezi, Elham, Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram, Fung, Pascale (2022) "Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset", 2022. Link: https://arxiv.org/pdf/2201.02419.pdf

Training Hyperparameters

  • learning_rate: 5e-5
  • train_batch_size: 25 (on 2 GPUs)
  • eval_batch_size: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16x2x2=64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 14000
  • mixed_precision_training: Native AMP
  • augmentation: SpecAugment

Training Results

Training Loss Epoch Step Validation Loss Cer
0.4610 0.55 2000 0.3106 13.08
0.3441 1.11 4000 0.2875 11.79
0.3466 1.66 6000 0.2820 11.44
0.2539 2.22 8000 0.2777 10.59
0.2312 2.77 10000 0.2822 10.60
0.1639 3.32 12000 0.2859 10.17
0.1569 3.88 14000 0.2866 10.11