Spaces:
Running
Running
Yurii Paniv
commited on
Commit
·
72475af
1
Parent(s):
6fb0a7e
Add missing information to README.md
Browse files1. Add disclaimer.
2. Add link to Coqui STT.
3. Hide guide by default.
4. Update guide.
README.md
CHANGED
@@ -1,27 +1,42 @@
|
|
1 |
# voice-recognition-ua
|
2 |
-
This is a repository with aim to apply [
|
3 |
You can see online demo here: https://voice-recognition-ua.herokuapp.com (your voice is not stored).
|
4 |
Source code is in this repository together with auto-deploy pipeline scripts.
|
5 |
-
P.S. Due to small size of dataset (
|
6 |
Contribute your voice to [Common Voice project](https://commonvoice.mozilla.org/uk "Common Voice") yourself, so we can improve model accuracy.
|
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
## Pre-run requirements
|
9 |
Make sure to download:
|
10 |
-
1. https://github.com/robinhad/voice-recognition-ua/releases/download/v0.
|
11 |
-
|
12 |
|
13 |
## How to launch
|
14 |
```
|
15 |
export FLASK_APP=main.py
|
|
|
16 |
flask run
|
17 |
```
|
18 |
|
19 |
# How to train your own model
|
20 |
|
|
|
|
|
21 |
Most of the guide is took from there:
|
22 |
-
https://deepspeech.readthedocs.io/en/v0.9.
|
|
|
|
|
|
|
23 |
|
24 |
## Steps:
|
|
|
|
|
|
|
25 |
1. Create g4dn.xlarge instance on AWS, Deep Learning AMI (Ubuntu 18.04), 150 GB of space.
|
26 |
|
27 |
2. Install Python requirements:
|
@@ -126,8 +141,10 @@ WER - Word Error Rate, calculates how much characters were guessed correctly.
|
|
126 |
CER - Character Error Rate, calculates how much characters were guessed correctly.
|
127 |
Here we have WER 95% and CER 36%.
|
128 |
It is high because we don't use scorer (language model that maps chacter sequence to the closest word match) during training, you can improve performance if you create scorer for Ukrainian language. As a text corpus you can use Wikipedia articles.
|
129 |
-
|
130 |
-
|
|
|
|
|
131 |
--------------------------------------------------------------------------------
|
132 |
Best WER:
|
133 |
--------------------------------------------------------------------------------
|
@@ -210,7 +227,8 @@ WER: 2.000000, CER: 0.333333, loss: 10.796988
|
|
210 |
- src: "легітимність"
|
211 |
- res: "вегі пимнсть"
|
212 |
--------------------------------------------------------------------------------
|
213 |
-
|
|
|
214 |
16. To export model for later usage:
|
215 |
```
|
216 |
mkdir model
|
@@ -230,3 +248,4 @@ python3 DeepSpeech.py \
|
|
230 |
--epochs 0
|
231 |
```
|
232 |
For advanced usage please refer to https://deepspeech.readthedocs.io/en/v0.9.1/USING.html
|
|
|
|
1 |
# voice-recognition-ua
|
2 |
+
This is a repository with aim to apply [Coqui STT](https://github.com/coqui-ai/STT "STT")(formerly [DeepSpeech](https://github.com/mozilla/DeepSpeech)) (state-of-the-art speech recognition model) on Ukrainian language.
|
3 |
You can see online demo here: https://voice-recognition-ua.herokuapp.com (your voice is not stored).
|
4 |
Source code is in this repository together with auto-deploy pipeline scripts.
|
5 |
+
P.S. Due to small size of dataset (50 hours), don't expect production-grade performance.
|
6 |
Contribute your voice to [Common Voice project](https://commonvoice.mozilla.org/uk "Common Voice") yourself, so we can improve model accuracy.
|
7 |
|
8 |
+
<h2>CAUTION: THIS MODEL AND SCORER IS PUBLISHED ONLY FOR RESEARCH AND NON-COMMERCIAL USE.</h2>
|
9 |
+
|
10 |
+
Checkout latest releases here: https://github.com/robinhad/voice-recognition-ua/releases/.
|
11 |
+
|
12 |
+
If you'd like to check out different models for Ukrainian language, please visit https://github.com/egorsmkv/speech-recognition-uk.
|
13 |
+
|
14 |
## Pre-run requirements
|
15 |
Make sure to download:
|
16 |
+
1. https://github.com/robinhad/voice-recognition-ua/releases/download/v0.4/uk.tflite
|
17 |
+
2. https://github.com/robinhad/voice-recognition-ua/releases/download/v0.4/kenlm.scorer
|
18 |
|
19 |
## How to launch
|
20 |
```
|
21 |
export FLASK_APP=main.py
|
22 |
+
export TOKEN=<Telegram bot API key>
|
23 |
flask run
|
24 |
```
|
25 |
|
26 |
# How to train your own model
|
27 |
|
28 |
+
Guides for importing data are available in [/scripts](/scripts) folder.
|
29 |
+
|
30 |
Most of the guide is took from there:
|
31 |
+
https://deepspeech.readthedocs.io/en/v0.9.3/TRAINING.html
|
32 |
+
|
33 |
+
Disclaimer: if you would like to continue working on the model, use https://github.com/coqui-ai/STT (this is former DeepSpeech team, where development continues).
|
34 |
+
|
35 |
|
36 |
## Steps:
|
37 |
+
|
38 |
+
<details>
|
39 |
+
<summary>This guide could be outdated, please be aware.</summary>
|
40 |
1. Create g4dn.xlarge instance on AWS, Deep Learning AMI (Ubuntu 18.04), 150 GB of space.
|
41 |
|
42 |
2. Install Python requirements:
|
|
|
141 |
CER - Character Error Rate, calculates how much characters were guessed correctly.
|
142 |
Here we have WER 95% and CER 36%.
|
143 |
It is high because we don't use scorer (language model that maps chacter sequence to the closest word match) during training, you can improve performance if you create scorer for Ukrainian language. As a text corpus you can use Wikipedia articles.
|
144 |
+
|
145 |
+
<details>
|
146 |
+
<summary>Test on ../cv-corpus-5.1-2020-06-22/uk/clips/test.csv - WER: 0.950863, CER: 0.357779, loss: 59.444176</summary>
|
147 |
+
|
148 |
--------------------------------------------------------------------------------
|
149 |
Best WER:
|
150 |
--------------------------------------------------------------------------------
|
|
|
227 |
- src: "легітимність"
|
228 |
- res: "вегі пимнсть"
|
229 |
--------------------------------------------------------------------------------
|
230 |
+
</details>
|
231 |
+
|
232 |
16. To export model for later usage:
|
233 |
```
|
234 |
mkdir model
|
|
|
248 |
--epochs 0
|
249 |
```
|
250 |
For advanced usage please refer to https://deepspeech.readthedocs.io/en/v0.9.1/USING.html
|
251 |
+
</details>
|