File size: 3,766 Bytes

---
language:
- sr
license: apache-2.0
base_model: openai/whisper-small
tags:
- generated_from_trainer
datasets:
- espnet/yodas
- google/fleurs
- Sagicc/audio-lmb-ds
- mozilla-foundation/common_voice_16_1
metrics:
- wer
model-index:
- name: Whisper Small Sr Yodas
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 16_1
      type: mozilla-foundation/common_voice_16_1
      config: sr
      split: test
      args: sr
    metrics:
    - name: Wer
      type: wer
      value: 0.12195981670778992
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Whisper Small Sr Yodas

This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on merged datasets Common Voice 16 + Fleurs + [Juzne vesti (South news)](http://hdl.handle.net/11356/1679) + [LBM](https://huggingface.co/datasets/Sagicc/audio-lmb-ds) + (Yodas)[https://huggingface.co/datasets/espnet/yodas] dataset and

Rupnik, Peter and Ljubešić, Nikola, 2022,\
  ASR training dataset for Serbian JuzneVesti-SR v1.0, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042,\
  http://hdl.handle.net/11356/1679.
  
It achieves the following results on the evaluation set:
- Loss: 0.3584
- Wer Ortho: 0.2328
- Wer: 0.1220

## Model description

Added new dataset Yodas as test and experiment to improve results.

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- num_epochs: 10
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Wer Ortho | Wer    |
|:-------------:|:-----:|:-----:|:---------------:|:---------:|:------:|
| 0.6958        | 0.49  | 1000  | 0.2114          | 0.2528    | 0.1563 |
| 0.5941        | 0.98  | 2000  | 0.1857          | 0.2214    | 0.1269 |
| 0.3985        | 1.46  | 3000  | 0.1729          | 0.2106    | 0.1167 |
| 0.4187        | 1.95  | 4000  | 0.1745          | 0.2120    | 0.1147 |
| 0.3446        | 2.44  | 5000  | 0.1770          | 0.2074    | 0.1139 |
| 0.2992        | 2.93  | 6000  | 0.1710          | 0.2048    | 0.1061 |
| 0.2074        | 3.42  | 7000  | 0.1887          | 0.2090    | 0.1123 |
| 0.1958        | 3.91  | 8000  | 0.1871          | 0.2136    | 0.1131 |
| 0.1707        | 4.39  | 9000  | 0.2069          | 0.2230    | 0.1126 |
| 0.1403        | 4.88  | 10000 | 0.2092          | 0.2138    | 0.1110 |
| 0.0871        | 5.37  | 11000 | 0.2345          | 0.2216    | 0.1161 |
| 0.0856        | 5.86  | 12000 | 0.2384          | 0.2281    | 0.1161 |
| 0.0496        | 6.35  | 13000 | 0.2657          | 0.2327    | 0.1211 |
| 0.0542        | 6.84  | 14000 | 0.2760          | 0.2346    | 0.1198 |
| 0.0274        | 7.32  | 15000 | 0.3024          | 0.2304    | 0.1218 |
| 0.0281        | 7.81  | 16000 | 0.3134          | 0.2357    | 0.1216 |
| 0.0151        | 8.3   | 17000 | 0.3328          | 0.2276    | 0.1188 |
| 0.0165        | 8.79  | 18000 | 0.3417          | 0.2348    | 0.1220 |
| 0.0094        | 9.28  | 19000 | 0.3545          | 0.2318    | 0.1221 |
| 0.0125        | 9.77  | 20000 | 0.3584          | 0.2328    | 0.1220 |


### Framework versions

- Transformers 4.39.3
- Pytorch 2.0.1+cu117
- Datasets 2.18.0
- Tokenizers 0.15.1