nithinraok
commited on
Commit
·
1f0724b
1
Parent(s):
e9993e4
Update README.md
Browse files
README.md
CHANGED
@@ -21,12 +21,12 @@ tags:
|
|
21 |
- automatic-speech-recognition
|
22 |
- speech
|
23 |
- audio
|
24 |
-
- Transducer
|
25 |
- FastConformer
|
26 |
- Conformer
|
27 |
- pytorch
|
28 |
- NeMo
|
29 |
- hf-asr-leaderboard
|
|
|
30 |
license: cc-by-4.0
|
31 |
widget:
|
32 |
- example_title: Librispeech sample 1
|
@@ -117,7 +117,7 @@ model-index:
|
|
117 |
metrics:
|
118 |
- name: Test WER
|
119 |
type: wer
|
120 |
-
value: 4.
|
121 |
- task:
|
122 |
type: Automatic Speech Recognition
|
123 |
name: automatic-speech-recognition
|
@@ -160,7 +160,6 @@ model-index:
|
|
160 |
- name: Test WER
|
161 |
type: wer
|
162 |
value: 9.02
|
163 |
-
|
164 |
metrics:
|
165 |
- wer
|
166 |
pipeline_tag: automatic-speech-recognition
|
@@ -179,7 +178,7 @@ img {
|
|
179 |
| [![Language](https://img.shields.io/badge/Language-en-lightgrey#model-badge)](#datasets)
|
180 |
|
181 |
|
182 |
-
parakeet-
|
183 |
It is an XXL version of FastConformer CTC [1] (around 1.1B parameters) model.
|
184 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
|
185 |
|
@@ -198,7 +197,7 @@ The model is available for use in the NeMo toolkit [3], and can be used as a pre
|
|
198 |
|
199 |
```python
|
200 |
import nemo.collections.asr as nemo_asr
|
201 |
-
asr_model = nemo_asr.models.
|
202 |
```
|
203 |
|
204 |
### Transcribing using Python
|
@@ -259,7 +258,7 @@ The training dataset consists of private subset with 40K hours of English speech
|
|
259 |
|
260 |
The performance of Automatic Speech Recognition models is measuring using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general.
|
261 |
|
262 |
-
The following tables summarizes the performance of the available models in this collection with the
|
263 |
|
264 |
|**Version**|**Tokenizer**|**Vocabulary Size**|**AMI**|**Earnings-22**|**Giga Speech**|**LS test-clean**|**SPGI Speech**|**TEDLIUM-v3**|**Vox Populi**|**Common Voice**|
|
265 |
|---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|-------|------|------|
|
|
|
21 |
- automatic-speech-recognition
|
22 |
- speech
|
23 |
- audio
|
|
|
24 |
- FastConformer
|
25 |
- Conformer
|
26 |
- pytorch
|
27 |
- NeMo
|
28 |
- hf-asr-leaderboard
|
29 |
+
- ctc
|
30 |
license: cc-by-4.0
|
31 |
widget:
|
32 |
- example_title: Librispeech sample 1
|
|
|
117 |
metrics:
|
118 |
- name: Test WER
|
119 |
type: wer
|
120 |
+
value: 4.2
|
121 |
- task:
|
122 |
type: Automatic Speech Recognition
|
123 |
name: automatic-speech-recognition
|
|
|
160 |
- name: Test WER
|
161 |
type: wer
|
162 |
value: 9.02
|
|
|
163 |
metrics:
|
164 |
- wer
|
165 |
pipeline_tag: automatic-speech-recognition
|
|
|
178 |
| [![Language](https://img.shields.io/badge/Language-en-lightgrey#model-badge)](#datasets)
|
179 |
|
180 |
|
181 |
+
parakeet-ctc-1.1b is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) and [Suno.ai](https://www.suno.ai/) teams.
|
182 |
It is an XXL version of FastConformer CTC [1] (around 1.1B parameters) model.
|
183 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
|
184 |
|
|
|
197 |
|
198 |
```python
|
199 |
import nemo.collections.asr as nemo_asr
|
200 |
+
asr_model = nemo_asr.models.EncDecCTCBPEModel.from_pretrained(model_name="nvidia/parakeet-ctc-1.1b")
|
201 |
```
|
202 |
|
203 |
### Transcribing using Python
|
|
|
258 |
|
259 |
The performance of Automatic Speech Recognition models is measuring using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general.
|
260 |
|
261 |
+
The following tables summarizes the performance of the available models in this collection with the CTC decoder. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
|
262 |
|
263 |
|**Version**|**Tokenizer**|**Vocabulary Size**|**AMI**|**Earnings-22**|**Giga Speech**|**LS test-clean**|**SPGI Speech**|**TEDLIUM-v3**|**Vox Populi**|**Common Voice**|
|
264 |
|---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|-------|------|------|
|