poonehmousavi commited on
Commit
f37709b
1 Parent(s): 0f0deee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -13
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  language:
3
- - en
4
  thumbnail: null
5
  tags:
6
  - automatic-speech-recognition
@@ -11,17 +11,16 @@ tags:
11
  license: apache-2.0
12
  datasets:
13
  - common_voice
14
-
15
  metrics:
16
- - name: Test WER
17
- type: wer
18
- value: ' 23.88'
19
  ---
20
 
21
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
22
  <br/><br/>
23
 
24
- # CRDNN with CTC/Attention trained on CommonVoice 14.0 English (No LM)
25
  This repository provides all the necessary tools to perform automatic speech
26
  recognition from an end-to-end system pretrained on CommonVoice (German Language) within
27
  SpeechBrain. For a better experience, we encourage you to learn more about
@@ -30,7 +29,7 @@ The performance of the model is the following:
30
 
31
  | Release | Test CER | Test WER | GPUs |
32
  |:-------------:|:--------------:|:--------------:| :--------:|
33
- | 15.08.23 | 12.76 | 23.88 | 1xV100 32GB |
34
 
35
  ## Credits
36
  The model is provided by [vitas.ai](https://www.vitas.ai/).
@@ -39,7 +38,7 @@ The model is provided by [vitas.ai](https://www.vitas.ai/).
39
  This ASR system is composed of 2 different but linked blocks:
40
 
41
  - Tokenizer (unigram) that transforms words into subword units and trained with
42
- the train transcriptions (train.tsv) of CommonVoice (en).
43
  - Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
44
  N blocks of convolutional neural networks with normalization and pooling on the
45
  frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
@@ -58,12 +57,12 @@ pip install speechbrain
58
  Please notice that we encourage you to read our tutorials and learn more about
59
  [SpeechBrain](https://speechbrain.github.io).
60
 
61
- ### Transcribing your own audio files (in English)
62
 
63
  ```python
64
  from speechbrain.pretrained import EncoderDecoderASR
65
- asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/speechbrain/asr-crdnn-commonvoice-14-en", savedir="pretrained_models/speechbrain/asr-crdnn-commonvoice-14-en")
66
- asr_model.transcribe_file("speechbrain/speechbrain/asr-crdnn-commonvoice-14-en/example-en.wav")
67
  ```
68
 
69
  ### Inference on GPU
@@ -97,10 +96,10 @@ pip install -e .
97
 
98
  ```
99
  cd recipes/CommonVoice/ASR/seq2seq
100
- python train.py hparams/train_en.yaml --data_folder=your_data_folder
101
  ```
102
 
103
- You can find our training results (models, logs, etc) [here](https://www.dropbox.com/sh/zgatirb118f79ef/AACmjh-D94nNDWcnVI4Ef5K7a?dl=0)
104
 
105
  ### Limitations
106
 
 
1
  ---
2
  language:
3
+ - it
4
  thumbnail: null
5
  tags:
6
  - automatic-speech-recognition
 
11
  license: apache-2.0
12
  datasets:
13
  - common_voice
 
14
  metrics:
15
+ - name: Test WER
16
+ type: wer
17
+ value: ' 17.02'
18
  ---
19
 
20
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
21
  <br/><br/>
22
 
23
+ # CRDNN with CTC/Attention trained on CommonVoice 14.0 Italian (No LM)
24
  This repository provides all the necessary tools to perform automatic speech
25
  recognition from an end-to-end system pretrained on CommonVoice (German Language) within
26
  SpeechBrain. For a better experience, we encourage you to learn more about
 
29
 
30
  | Release | Test CER | Test WER | GPUs |
31
  |:-------------:|:--------------:|:--------------:| :--------:|
32
+ | 15.08.23 | 12.76 | 6.27 | 1xV100 32GB |
33
 
34
  ## Credits
35
  The model is provided by [vitas.ai](https://www.vitas.ai/).
 
38
  This ASR system is composed of 2 different but linked blocks:
39
 
40
  - Tokenizer (unigram) that transforms words into subword units and trained with
41
+ the train transcriptions (train.tsv) of CommonVoice (it).
42
  - Acoustic model (CRDNN + CTC/Attention). The CRDNN architecture is made of
43
  N blocks of convolutional neural networks with normalization and pooling on the
44
  frequency domain. Then, a bidirectional LSTM is connected to a final DNN to obtain
 
57
  Please notice that we encourage you to read our tutorials and learn more about
58
  [SpeechBrain](https://speechbrain.github.io).
59
 
60
+ ### Transcribing your own audio files (in Italian)
61
 
62
  ```python
63
  from speechbrain.pretrained import EncoderDecoderASR
64
+ asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/speechbrain/asr-crdnn-commonvoice-14-it", savedir="pretrained_models/speechbrain/asr-crdnn-commonvoice-14-it")
65
+ asr_model.transcribe_file("speechbrain/speechbrain/asr-crdnn-commonvoice-14-it/example-it.wav")
66
  ```
67
 
68
  ### Inference on GPU
 
96
 
97
  ```
98
  cd recipes/CommonVoice/ASR/seq2seq
99
+ python train.py hparams/train_it.yaml --data_folder=your_data_folder
100
  ```
101
 
102
+ You can find our training results (models, logs, etc) [here](https://www.dropbox.com/sh/ss59uu0j5boscvp/AAASsiFhlB1nDWPkFX410bzna?dl=0)
103
 
104
  ### Limitations
105