librarian-bot's picture
Librarian Bot: Add base_model information to model
183b1d9
|
raw
history blame
4.14 kB
metadata
language:
  - ar
  - multilingual
license: apache-2.0
tags:
  - automatic-speech-recognition
  - hf-asr-leaderboard
  - whisper-event
  - generated_from_trainer
  - Arabic
  - multilingual
  - STT
datasets:
  - mozilla-foundation/common_voice_12_0
metrics:
  - wer
base_model: openai/whisper-small
model-index:
  - name: Kalemat-Tech Arabic Speech Recognition Model (STT)
    results:
      - task:
          type: automatic-speech-recognition
          name: Speech Recognition
        dataset:
          name: mozilla-foundation/common_voice_12_0
          type: mozilla-foundation/common_voice_12_0
          config: ar
          split: test
          args: ar
        metrics:
          - type: wer
            value: 58.5848
            name: wer

Kalemat-Tech Arabic Speech Recognition Model (STT) - Mohamed Salama

نموذج كلماتك للتعرف على الأصوات العربية الفصحى و تحويلها إلى نصوص

KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small

This model is a fine-tuned version of openai/whisper-small on Common_Voice_Arabic_12.0_Augmented. It achieves the following results on the evaluation set:

  • Loss: 0.5362
  • Wer: 58.5848

Example of usage:

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

processor = AutoProcessor.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")

model = AutoModelForSpeechSeq2Seq.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")

Intended uses & limitations

Automatic Speech Recognition

Training and evaluation data

Common_Voice_Arabic_12.0 and I made some augmentations to it as follows:
- 25% of the data TimeMasking
- 25% of the data SpecAugmentation
- 25% of the data WavAugmentation (AddGaussianNoise)
- The final dataset is the original common voice plus the augmented files

Training procedure

Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 64
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 25
- mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.2728 1.01 1000 0.3063 60.4733
0.1442 2.01 2000 0.2878 55.6935
0.0648 3.02 3000 0.3009 59.2568
0.0318 4.03 4000 0.3278 59.2993
0.0148 5.04 5000 0.3539 61.0364
0.0088 6.04 6000 0.3714 56.9154
0.0061 7.05 7000 0.3920 57.5515
0.0041 8.06 8000 0.4149 61.6328
0.0033 9.06 9000 0.4217 58.0310
0.0033 10.07 10000 0.4376 59.9594
0.0021 11.08 11000 0.4485 56.7812
0.0015 12.08 12000 0.4577 57.6936
0.0013 13.09 13000 0.4671 60.6606
0.0011 14.1 14000 0.4686 59.8159
0.0008 15.11 15000 0.4856 60.7111
0.0011 16.11 16000 0.4851 59.5198
0.0005 17.12 17000 0.4936 59.2608
0.0004 18.13 18000 0.4995 57.9619
0.0003 19.13 19000 0.5085 58.3630
0.0002 20.14 20000 0.5155 58.0987
0.0001 21.15 21000 0.5251 58.8504
0.0001 22.16 22000 0.5268 58.4228
0.0001 23.16 23000 0.5317 59.0881
0.0001 24.17 24000 0.5362 58.5848

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.1+cu117
  • Datasets 2.8.0
  • Tokenizers 0.13.2