Yurii Paniv commited on
Commit
22529f3
1 Parent(s): 8114fae

Update README

Browse files
Files changed (1) hide show
  1. README.md +6 -244
README.md CHANGED
@@ -8,254 +8,16 @@ app_file: app.py
8
  pinned: false
9
  ---
10
 
11
- # voice-recognition-ua
12
- This is a repository with aim to apply [Coqui STT](https://github.com/coqui-ai/STT "STT")(formerly [DeepSpeech](https://github.com/mozilla/DeepSpeech)) (state-of-the-art speech recognition model) on Ukrainian language.
13
- You can see online demo here: https://voice-recognition-ua.herokuapp.com (your voice is not stored).
14
- Source code is in this repository together with auto-deploy pipeline scripts.
15
- P.S. Due to small size of dataset (50 hours), don't expect production-grade performance.
16
- Contribute your voice to [Common Voice project](https://commonvoice.mozilla.org/uk "Common Voice") yourself, so we can improve model accuracy.
17
 
18
- This model is licensed under [Creative Commons Attribution-NonCommercial 4.0 International License](./LICENSE).
19
 
20
  Checkout latest releases here: https://github.com/robinhad/voice-recognition-ua/releases/.
21
 
22
  If you'd like to check out different models for Ukrainian language, please visit https://github.com/egorsmkv/speech-recognition-uk.
23
 
24
- ## Pre-run requirements
25
- Make sure to download:
26
- 1. https://github.com/robinhad/voice-recognition-ua/releases/download/v0.4/uk.tflite
27
- 2. https://github.com/robinhad/voice-recognition-ua/releases/download/v0.4/kenlm.scorer
28
-
29
- ## How to launch
30
- ```
31
- export FLASK_APP=main.py
32
- export TOKEN=<Telegram bot API key>
33
- flask run
34
- ```
35
-
36
- # How to train your own model
37
-
38
  Guides for importing data are available in [/scripts](/scripts) folder.
39
-
40
- Most of the guide is took from there:
41
- https://deepspeech.readthedocs.io/en/v0.9.3/TRAINING.html
42
-
43
- Disclaimer: if you would like to continue working on the model, use https://github.com/coqui-ai/STT (this is former DeepSpeech team, where development continues).
44
-
45
-
46
- ## Steps:
47
-
48
- <details>
49
- <summary>This guide could be outdated, please be aware.</summary>
50
- 1. Create g4dn.xlarge instance on AWS, Deep Learning AMI (Ubuntu 18.04), 150 GB of space.
51
-
52
- 2. Install Python requirements:
53
- ```
54
- sudo apt-get install python3-dev sox libsox-fmt-mp3 # sox is used for audio reading
55
- ```
56
-
57
- 3. Clone DeepSpeech branch v0.9.1
58
- ```
59
- git clone --branch v0.9.1 https://github.com/mozilla/DeepSpeech
60
- ```
61
- 4. Go into DeepSpeech directory:
62
- ```
63
- cd DeepSpeech
64
- ```
65
- 5. Create virtual environment using conda (it will be easier to manage CUDA libraries):
66
- ```
67
- conda create --prefix $HOME/tmp/deepspeech-train-venv/ python=3.7
68
- ```
69
- 6. Activate it:
70
- ```
71
- conda activate /home/ubuntu/tmp/deepspeech-train-venv
72
- ```
73
- 7. Install DeepSpeech requirements:
74
- ```
75
- pip3 install --upgrade pip==20.2.2 wheel==0.34.2 setuptools==49.6.0
76
- pip3 install --upgrade -e .
77
- ```
78
- 8. Install required CUDA libraries:
79
- ```
80
- conda install cudnn=7.6=cuda10.1_0
81
- pip3 install 'tensorflow-gpu==1.15.4'
82
- ```
83
- 9. Open https://commonvoice.mozilla.org/uk/datasets and copy link to Ukrainian dataset.
84
- ```
85
- cd ..
86
- wget <your_link_to_dataset>
87
- tar -xf uk.tar.gz
88
- ```
89
- You'll get a folder named `cv-corpus-5.1-2020-06-22`
90
- 10. Download alphabet, used for dataset.
91
- Alphabet is a file with all possible symbols, that are going to be in a dataset. Outputs are directly formed from alphabet. Alphabet is also used for filtering, data, that contain symbols not in alphabet, will be skipped.
92
- ```
93
- cd ./DeepSpeech
94
- mkdir data_uk
95
- cd ./data_uk
96
- wget https://github.com/robinhad/voice-recognition-ua/releases/download/v0.2/alphabet.txt
97
- ```
98
- NOTE: if you create your alphabet, make sure it's in UTF-8 format
99
-
100
- 11. Filter data, that contains symbols not in alphabet:
101
- ```
102
- cd .. # DeepSpeech
103
- bin/import_cv2.py --filter_alphabet ./data_uk/alphabet.txt ../cv-corpus-5.1-2020-06-22/uk
104
- ```
105
- 12. (Optional step if you want to create model from scratch, expect low performance because of small dataset (~20 hours for Ukrainian))
106
- ```
107
- python3 DeepSpeech.py --train_files ../data/CV/en/clips/train.csv --dev_files ../data/CV/en/clips/dev.csv --test_files ../data/CV/en/clips/test.csv
108
- ```
109
- 13. Transfer Learning
110
- Transfer learning is method of using existing, pre-trained model on one dataset and apply it on similar, but another. In example, if we do speech recognition, we can use a fact that with each layer model deals with more general concept. Starting layers recognize different sound and low-level patterns, whereas later layers are more involved in final output (letters). So in that case we freeze all the layers (they don't update during training) except the specified last ones, where we substitute English alphabet with Ukrainian one.
111
- Below we will download English model checkpoint and create folder for Ukrainian one.
112
- ```
113
- mkdir checkpoints
114
- cd ./checkpoints
115
- wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.1/deepspeech-0.9.1-checkpoint.tar.gz
116
- tar -xf deepspeech-0.9.1-checkpoint.tar.gz
117
- mkdir uk_transfer_checkpoint
118
- cd ..
119
- ```
120
- 14. Start a training itself. (if you want to make changes to training parameters, run `python3 DeepSpeech.py --helpfull` for list of all parameters).
121
- When model finishes training, there will be error due to bug in DeepSpeech that will prevent evaluating performance for now, we will fix it in the next step.
122
- It will take a while, ~11 minutes per epoch.
123
- ```
124
- python3 DeepSpeech.py \
125
- --train_cudnn \
126
- --drop_source_layers 2 \
127
- --alphabet_config_path ./data_uk/alphabet.txt \
128
- --save_checkpoint_dir ./checkpoints/uk_transfer_checkpoint \
129
- --load_checkpoint_dir ./checkpoints/deepspeech-0.9.1-checkpoint \
130
- --train_files ../cv-corpus-5.1-2020-06-22/uk/clips/train.csv \
131
- --dev_files ../cv-corpus-5.1-2020-06-22/uk/clips/dev.csv \
132
- --test_files ../cv-corpus-5.1-2020-06-22/uk/clips/test.csv \
133
- --epochs 10 \
134
- ```
135
- 15. Evaluate model:
136
- ```
137
- python3 DeepSpeech.py \
138
- --train_cudnn \
139
- --alphabet_config_path ./data_uk/alphabet.txt \
140
- --load_checkpoint_dir ./checkpoints/uk_transfer_checkpoint \
141
- --train_files ../cv-corpus-5.1-2020-06-22/uk/clips/train.csv \
142
- --dev_files ../cv-corpus-5.1-2020-06-22/uk/clips/dev.csv \
143
- --test_files ../cv-corpus-5.1-2020-06-22/uk/clips/test.csv \
144
- --test_batch_size 40 \
145
- --epochs 0
146
- ```
147
- It will take a while, approximately 20-30 minutes.
148
-
149
- You will get performance report:
150
- WER - Word Error Rate, calculates how much characters were guessed correctly.
151
- CER - Character Error Rate, calculates how much characters were guessed correctly.
152
- Here we have WER 95% and CER 36%.
153
- It is high because we don't use scorer (language model that maps chacter sequence to the closest word match) during training, you can improve performance if you create scorer for Ukrainian language. As a text corpus you can use Wikipedia articles.
154
-
155
- <details>
156
- <summary>Test on ../cv-corpus-5.1-2020-06-22/uk/clips/test.csv - WER: 0.950863, CER: 0.357779, loss: 59.444176</summary>
157
-
158
- --------------------------------------------------------------------------------
159
- Best WER:
160
- --------------------------------------------------------------------------------
161
- WER: 0.000000, CER: 0.000000, loss: 2.696858
162
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21203420.wav
163
- - src: "я замер"
164
- - res: "я замер"
165
- --------------------------------------------------------------------------------
166
- WER: 0.000000, CER: 0.000000, loss: 1.772630
167
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21755897.wav
168
- - src: "що саме"
169
- - res: "що саме"
170
- --------------------------------------------------------------------------------
171
- WER: 0.000000, CER: 0.000000, loss: 0.269474
172
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21350648.wav
173
- - src: "ні"
174
- - res: "ні"
175
- --------------------------------------------------------------------------------
176
- WER: 0.250000, CER: 0.066667, loss: 7.652889
177
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_22161067.wav
178
- - src: "і вухом не веде"
179
- - res: "і вухом не виде"
180
- --------------------------------------------------------------------------------
181
- WER: 0.333333, CER: 0.142857, loss: 22.727850
182
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_20894315.wav
183
- - src: "подробиці наразі уточнюються"
184
- - res: "подробиці наразі удочнвітцся"
185
- --------------------------------------------------------------------------------
186
- Median WER:
187
- --------------------------------------------------------------------------------
188
- WER: 1.000000, CER: 0.408163, loss: 77.099953
189
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21565481.wav
190
- - src: "це було висвітлено і в засобах масової інформації"
191
- - res: "сцеболовистітоно ів засовавнасавинсерматції"
192
- --------------------------------------------------------------------------------
193
- WER: 1.000000, CER: 0.304878, loss: 76.661797
194
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21568626.wav
195
- - src: "всі ці зірки для тебе сказав хлопчик і ударив дівчинку металевим тазіком по голові"
196
- - res: "сицізяртідлетебе сказавни хлобчик юдаревдів чимкуметалевимтазіком поговолі"
197
- --------------------------------------------------------------------------------
198
- WER: 1.000000, CER: 0.261364, loss: 76.638161
199
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_22071941.wav
200
- - src: "кабінет міністрів україни складає повноваження перед новообраною верховною радою україни"
201
- - res: "кабіна міністрівукаїни колале повнваженя перебновообрануюварховли радийву країни"
202
- --------------------------------------------------------------------------------
203
- WER: 1.000000, CER: 0.403846, loss: 76.634865
204
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21381457.wav
205
- - src: "механізм формування агатів остаточно не встановлений"
206
- - res: "махенізаформовання оатья востотачномистоновлими"
207
- --------------------------------------------------------------------------------
208
- WER: 1.000000, CER: 0.415094, loss: 76.133347
209
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21567387.wav
210
- - src: "засідання верховної ради україни проводяться відкрито"
211
- - res: "засі веневорковмаградиукраїне проодізівікрипо"
212
- --------------------------------------------------------------------------------
213
- Worst WER:
214
- --------------------------------------------------------------------------------
215
- WER: 1.500000, CER: 0.266667, loss: 18.258444
216
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_20900153.wav
217
- - src: "вона віддасться"
218
- - res: "пона віддас ця"
219
- --------------------------------------------------------------------------------
220
- WER: 1.500000, CER: 0.307692, loss: 15.984250
221
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_22247322.wav
222
- - src: "ескулап лікар"
223
- - res: "е скула лліка"
224
- --------------------------------------------------------------------------------
225
- WER: 1.500000, CER: 0.277778, loss: 15.076320
226
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21582521.wav
227
- - src: "цензура заборонена"
228
- - res: "зан зура забороонено"
229
- --------------------------------------------------------------------------------
230
- WER: 1.666667, CER: 0.478261, loss: 42.762665
231
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21568871.wav
232
- - src: "пегас символізує поезію"
233
- - res: "веляс це волі зуя поєсі"
234
- --------------------------------------------------------------------------------
235
- WER: 2.000000, CER: 0.333333, loss: 10.796988
236
- - wav: file://../cv-corpus-5.1-2020-06-22/uk/clips/common_voice_uk_21563967.wav
237
- - src: "легітимність"
238
- - res: "вегі пимнсть"
239
- --------------------------------------------------------------------------------
240
- </details>
241
-
242
- 16. To export model for later usage:
243
- ```
244
- mkdir model
245
- # export .pb file
246
- python3 DeepSpeech.py \
247
- --train_cudnn \
248
- --alphabet_config_path ./data_uk/alphabet.txt \
249
- --load_checkpoint_dir ./checkpoints/uk_transfer_checkpoint \
250
- --export_dir ./model \
251
- --epochs 0
252
- # export .tflite file for embedded usage
253
- python3 DeepSpeech.py \
254
- --train_cudnn \
255
- --alphabet_config_path ./data_uk/alphabet.txt \
256
- --load_checkpoint_dir ./checkpoints/uk_transfer_checkpoint \
257
- --export_tflite --export_dir ./model \
258
- --epochs 0
259
- ```
260
- For advanced usage please refer to https://deepspeech.readthedocs.io/en/v0.9.1/USING.html
261
- </details>
 
8
  pinned: false
9
  ---
10
 
11
+ # 🇺🇦🎤 Voice recognition for Ukrainian language
12
+ This is a repository with aim to apply [Coqui STT](https://github.com/coqui-ai/STT "STT")(formerly [DeepSpeech](https://github.com/mozilla/DeepSpeech)) speech recognition model on Ukrainian language.
13
+ You can see online demo here: https://huggingface.co/spaces/robinhad/ukrainian-stt.
14
+ Source code is in this repository together with auto-deploy pipeline scripts.
 
 
15
 
16
+ Model trained using non-free data is licensed under [Creative Commons Attribution-NonCommercial 4.0 International License](./LICENSE), otherwise it's MIT licence (where models are marked).
17
 
18
  Checkout latest releases here: https://github.com/robinhad/voice-recognition-ua/releases/.
19
 
20
  If you'd like to check out different models for Ukrainian language, please visit https://github.com/egorsmkv/speech-recognition-uk.
21
 
22
+ # 🤖 Data import scripts
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  Guides for importing data are available in [/scripts](/scripts) folder.