Whisper Base for Korean Low quaiity Call Voices
This model is a fine-tuned version of openai/whisper-base on the Korean Low Quaiity Call Voices dataset. It achieves the following results on the evaluation set:
- Loss: 0.4941
- Cer: 30.7538
Model description
ํ๋ก์ ํธ ์ฉ๋๋ก ํ์ธํ๋๋ ๋ชจ๋ธ์ ๋๋ค. OpenAI์ Whisper-Base ๋ชจ๋ธ์ ๋ฐํ์ผ๋ก 'ํ๊ตญ์ด ์ ์์ง ์์ฑ ํตํ ๋ฐ์ดํฐ'์ ๋ํ ์ ํ๋๋ฅผ ์ฆ๊ฐ์ํค๊ณ ์ ํ์ธํ๋์ ์งํํ ๋ชจ๋ธ์ด๋ฉฐ, ์ฌ์ฉํ ๋ฐ์ดํฐ๋ AI-HUB์ โ์ ์์ง ์ ํ๋ง ์์ฑ์ธ์ ๋ฐ์ดํฐโ ์ค ์ผ๋ถ๋ก์ ์ค๋์ค ํ์ผ ๊ธฐ์ค 240,771.06์ด(ํ์ผ 1๊ฐ๋น ํ๊ท ๊ธธ์ด๋ ์ฝ 5.296์ด) ํ ์คํธ ๋ฐ์ดํฐ ๊ธฐ์ค ์ด 1,696,414๊ธ์์ ํฌ๊ธฐ์ ๋๋ค.
This is a fine-tuned model for project use. This model was fine-tuned to increase the accuracy of โKorean low-quality voice call dataโ based on OpenAIโs Whisper-Base model. The data used is part of AI-HUBโs โlow-quality telephone network voice recognition dataโ, which is 240,771.06 seconds based on audio files(average length per file is about 5.296 seconds). The total size is 1,696,414 characters based on text data.
Intended uses & limitations
ํ์ธํ๋์ ์ฌ์ฉ๋ Base model๊ณผ dataset ๋ชจ๋ ํ์ต ๋ชฉ์ ์ผ๋ก ์ฌ์ฉํ์์ผ๋ฉฐ, ๋ฐ๋ผ์ ๋ณธ ๋ชจ๋ธ ์ญ์ ํ์ต ๋ชฉ์ ์ผ๋ก๋ง ์ฌ์ฉ ๊ฐ๋ฅํฉ๋๋ค.
Both the base model and dataset used for fine tuning were used for learning purposes, so this model can also be used only for learning purposes.
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 8000
Training results
Training Loss | Epoch | Step | Validation Loss | Cer |
---|---|---|---|---|
0.6416 | 0.44 | 1000 | 0.6564 | 64.1489 |
0.5914 | 0.88 | 2000 | 0.5688 | 37.4957 |
0.435 | 1.32 | 3000 | 0.5349 | 32.6734 |
0.4056 | 1.76 | 4000 | 0.5124 | 30.9065 |
0.3368 | 2.2 | 5000 | 0.5057 | 32.6925 |
0.3107 | 2.64 | 6000 | 0.4979 | 32.8315 |
0.3016 | 3.08 | 7000 | 0.4947 | 29.3060 |
0.2979 | 3.52 | 8000 | 0.4941 | 30.7538 |
Framework versions
- Transformers 4.34.0.dev0
- Pytorch 2.0.1+cu118
- Datasets 2.14.5
- Tokenizers 0.13.3
- Downloads last month
- 23
Model tree for INo0121/whisper-base-ko-callvoice
Base model
openai/whisper-base