File size: 4,138 Bytes
29a8690 61b1321 29a8690 61b1321 7def566 29a8690 da20fc1 6ee1e1a da20fc1 61b1321 29a8690 183b1d9 29a8690 61b1321 9af0340 183b1d9 9af0340 61b1321 9af0340 29a8690 df186e5 cc7a050 29a8690 399c972 29a8690 1ab7485 29a8690 1ab7485 29a8690 1ab7485 29a8690 1ab7485 29a8690 1ab7485 399c972 1ab7485 29a8690 1ab7485 29a8690 1ab7485 29a8690 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
---
language:
- ar
- multilingual
license: apache-2.0
tags:
- automatic-speech-recognition
- hf-asr-leaderboard
- whisper-event
- generated_from_trainer
- Arabic
- multilingual
- STT
datasets:
- mozilla-foundation/common_voice_12_0
metrics:
- wer
base_model: openai/whisper-small
model-index:
- name: Kalemat-Tech Arabic Speech Recognition Model (STT)
results:
- task:
type: automatic-speech-recognition
name: Speech Recognition
dataset:
name: mozilla-foundation/common_voice_12_0
type: mozilla-foundation/common_voice_12_0
config: ar
split: test
args: ar
metrics:
- type: wer
value: 58.5848
name: wer
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Kalemat-Tech Arabic Speech Recognition Model (STT) - Mohamed Salama
# نموذج كلماتك للتعرف على الأصوات العربية الفصحى و تحويلها إلى نصوص
# KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on Common_Voice_Arabic_12.0_Augmented.
It achieves the following results on the evaluation set:
- Loss: 0.5362
- Wer: 58.5848
## Example of usage:
```
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")
model = AutoModelForSpeechSeq2Seq.from_pretrained("Salama1429/KalemaTech-Arabic-STT-ASR-based-on-Whisper-Small")
```
## Intended uses & limitations
Automatic Speech Recognition
## Training and evaluation data
```
Common_Voice_Arabic_12.0 and I made some augmentations to it as follows:
- 25% of the data TimeMasking
- 25% of the data SpecAugmentation
- 25% of the data WavAugmentation (AddGaussianNoise)
- The final dataset is the original common voice plus the augmented files
```
## Training procedure
### Training hyperparameters
```
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 64
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 25
- mixed_precision_training: Native AMP
```
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:-----:|:---------------:|:-------:|
| 0.2728 | 1.01 | 1000 | 0.3063 | 60.4733 |
| 0.1442 | 2.01 | 2000 | 0.2878 | 55.6935 |
| 0.0648 | 3.02 | 3000 | 0.3009 | 59.2568 |
| 0.0318 | 4.03 | 4000 | 0.3278 | 59.2993 |
| 0.0148 | 5.04 | 5000 | 0.3539 | 61.0364 |
| 0.0088 | 6.04 | 6000 | 0.3714 | 56.9154 |
| 0.0061 | 7.05 | 7000 | 0.3920 | 57.5515 |
| 0.0041 | 8.06 | 8000 | 0.4149 | 61.6328 |
| 0.0033 | 9.06 | 9000 | 0.4217 | 58.0310 |
| 0.0033 | 10.07 | 10000 | 0.4376 | 59.9594 |
| 0.0021 | 11.08 | 11000 | 0.4485 | 56.7812 |
| 0.0015 | 12.08 | 12000 | 0.4577 | 57.6936 |
| 0.0013 | 13.09 | 13000 | 0.4671 | 60.6606 |
| 0.0011 | 14.1 | 14000 | 0.4686 | 59.8159 |
| 0.0008 | 15.11 | 15000 | 0.4856 | 60.7111 |
| 0.0011 | 16.11 | 16000 | 0.4851 | 59.5198 |
| 0.0005 | 17.12 | 17000 | 0.4936 | 59.2608 |
| 0.0004 | 18.13 | 18000 | 0.4995 | 57.9619 |
| 0.0003 | 19.13 | 19000 | 0.5085 | 58.3630 |
| 0.0002 | 20.14 | 20000 | 0.5155 | 58.0987 |
| 0.0001 | 21.15 | 21000 | 0.5251 | 58.8504 |
| 0.0001 | 22.16 | 22000 | 0.5268 | 58.4228 |
| 0.0001 | 23.16 | 23000 | 0.5317 | 59.0881 |
| 0.0001 | 24.17 | 24000 | 0.5362 | 58.5848 |
### Framework versions
- Transformers 4.25.1
- Pytorch 1.13.1+cu117
- Datasets 2.8.0
- Tokenizers 0.13.2
|