Automatic Speech Recognition
NeMo
PyTorch
English
speech
audio
Transducer
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Eval Results
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -179,13 +179,13 @@ img {
179
  | [![Language](https://img.shields.io/badge/Language-en-lightgrey#model-badge)](#datasets)
180
 
181
 
182
- parakeet-rnnt-1.1b is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) team and [Suno.ai](https://www.suno.ai/).
183
- It is a "extra extra large" version of FastConformer Transducer[1] (around 1.1B parameters) model.
184
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
185
 
186
  ## NVIDIA NeMo: Training
187
 
188
- To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
189
  ```
190
  pip install nemo_toolkit['all']
191
  ```
@@ -221,7 +221,7 @@ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
221
 
222
  ### Input
223
 
224
- This model accepts 16000 Hz Mono-channel Audio (wav files) as input.
225
 
226
  ### Output
227
 
@@ -241,7 +241,7 @@ The tokenizers for these models were built using the text transcripts of the tra
241
 
242
  The model was trained on 65K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams.
243
 
244
- Dataset contains following Public English speech sets (25K hrs)
245
 
246
  - Librispeech 960 hours of English speech
247
  - Fisher Corpus
@@ -251,9 +251,9 @@ Dataset contains following Public English speech sets (25K hrs)
251
  - VCTK
252
  - VoxPopuli (EN)
253
  - Europarl-ASR (EN)
254
- - Multilingual Librispeech (MLS EN) - 2,000 hrs subset
255
  - Mozilla Common Voice (v7.0)
256
- - People's Speech - 12,000 hrs subset
257
 
258
  ## Performance
259
 
 
179
  | [![Language](https://img.shields.io/badge/Language-en-lightgrey#model-badge)](#datasets)
180
 
181
 
182
+ parakeet-rnnt-1.1b is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) and [Suno.ai](https://www.suno.ai/) teams.
183
+ It is an XXL version of FastConformer Transducer [1] (around 1.1B parameters) model.
184
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
185
 
186
  ## NVIDIA NeMo: Training
187
 
188
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
189
  ```
190
  pip install nemo_toolkit['all']
191
  ```
 
221
 
222
  ### Input
223
 
224
+ This model accepts 16000 Hz mono-channel audio (wav files) as input.
225
 
226
  ### Output
227
 
 
241
 
242
  The model was trained on 65K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams.
243
 
244
+ Dataset contains following Public English speech sets (25K hours)
245
 
246
  - Librispeech 960 hours of English speech
247
  - Fisher Corpus
 
251
  - VCTK
252
  - VoxPopuli (EN)
253
  - Europarl-ASR (EN)
254
+ - Multilingual Librispeech (MLS EN) - 2,000 hour subset
255
  - Mozilla Common Voice (v7.0)
256
+ - People's Speech - 12,000 hour subset
257
 
258
  ## Performance
259