nvidia
/

parakeet-ctc-1.1b

@@ -21,12 +21,12 @@ tags:
 - automatic-speech-recognition
 - speech
 - audio
-- Transducer
 - FastConformer
 - Conformer
 - pytorch
 - NeMo
 - hf-asr-leaderboard
 license: cc-by-4.0
 widget:
 - example_title: Librispeech sample 1
@@ -117,7 +117,7 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 4.20
   - task:
       type: Automatic Speech Recognition
       name: automatic-speech-recognition
@@ -160,7 +160,6 @@ model-index:
     - name: Test WER
       type: wer
       value: 9.02
 metrics:
 - wer
 pipeline_tag: automatic-speech-recognition
@@ -179,7 +178,7 @@ img {
 | [![Language](https://img.shields.io/badge/Language-en-lightgrey#model-badge)](#datasets)
-parakeet-rnnt-1.1b is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) and [Suno.ai](https://www.suno.ai/) teams.
 It is an XXL version of FastConformer CTC [1] (around 1.1B parameters) model.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
@@ -198,7 +197,7 @@ The model is available for use in the NeMo toolkit [3], and can be used as a pre
 ```python
 import nemo.collections.asr as nemo_asr
-asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="nvidia/parakeet-ctc-1.1b")
 ```
 ### Transcribing using Python
@@ -259,7 +258,7 @@ The training dataset consists of private subset with 40K hours of English speech
 The performance of Automatic Speech Recognition models is measuring using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general.
-The following tables summarizes the performance of the available models in this collection with the Transducer decoder. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
 |**Version**|**Tokenizer**|**Vocabulary Size**|**AMI**|**Earnings-22**|**Giga Speech**|**LS test-clean**|**SPGI Speech**|**TEDLIUM-v3**|**Vox Populi**|**Common Voice**|
 |---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|-------|------|------|

 - automatic-speech-recognition
 - speech
 - audio
 - FastConformer
 - Conformer
 - pytorch
 - NeMo
 - hf-asr-leaderboard
+- ctc
 license: cc-by-4.0
 widget:
 - example_title: Librispeech sample 1
     metrics:
     - name: Test WER
       type: wer
+      value: 4.2
   - task:
       type: Automatic Speech Recognition
       name: automatic-speech-recognition
     - name: Test WER
       type: wer
       value: 9.02
 metrics:
 - wer
 pipeline_tag: automatic-speech-recognition
 | [![Language](https://img.shields.io/badge/Language-en-lightgrey#model-badge)](#datasets)
+parakeet-ctc-1.1b is an ASR model that transcribes speech in lower case English alphabet. This model is jointly developed by [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) and [Suno.ai](https://www.suno.ai/) teams.
 It is an XXL version of FastConformer CTC [1] (around 1.1B parameters) model.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer) for complete architecture details.
 ```python
 import nemo.collections.asr as nemo_asr
+asr_model = nemo_asr.models.EncDecCTCBPEModel.from_pretrained(model_name="nvidia/parakeet-ctc-1.1b")
 ```
 ### Transcribing using Python
 The performance of Automatic Speech Recognition models is measuring using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general.
+The following tables summarizes the performance of the available models in this collection with the CTC decoder. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
 |**Version**|**Tokenizer**|**Vocabulary Size**|**AMI**|**Earnings-22**|**Giga Speech**|**LS test-clean**|**SPGI Speech**|**TEDLIUM-v3**|**Vox Populi**|**Common Voice**|
 |---------|-----------------------|-----------------|---------------|---------------|------------|-----------|-----|-------|------|------|