dominguesm
/

stt_pt_quartznet15x5_ctc_small

Automatic Speech Recognition

Model card Files Files and versions Community

dominguesm commited on Jun 26, 2022

Commit

14fecbf

•

1 Parent(s): 31b09b5

Update README.md

Files changed (1) hide show

README.md +34 -3

README.md CHANGED Viewed

@@ -93,8 +93,6 @@ QuartzNet models take in audio segments and transcribe them to letter, byte pair
 All training scripts will be available at: [DominguesM/stt_pt_quartznet15x5_ctc_small](https://github.com/DominguesM/stt_pt_quartznet15x5_ctc_small)
-**Soon more information**
 ### Datasets
@@ -104,12 +102,45 @@ The model was trained with a part of the Common Voices 9.0 dataset in Portuguese
 ## Performance
-**Coming soon**
 ## Limitations
 Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
 ## References

 All training scripts will be available at: [DominguesM/stt_pt_quartznet15x5_ctc_small](https://github.com/DominguesM/stt_pt_quartznet15x5_ctc_small)
 ### Datasets
 ## Performance
+| Metric | Score |
+| ------- | ----- |
+| WER     | 49%   |
+| CER     | 18%   |
+The metrics were obtained using the following code:
+**Attention**: The steps below must be performed after downloading the dataset (Mozilla Commom Voices 9.0 PT) and following the steps of pre-processing the audio data and `manifest` files contained in the file [`notebooks/Finetuning CTC model Portuguese.ipynb`](https://github.com/DominguesM/stt_pt_quartznet15x5_ctc_small)
+```bash
+$ wget -P scripts/ "https://raw.githubusercontent.com/NVIDIA/NeMo/v1.9.0/examples/asr/speech_to_text_eval.py"
+$ wget -P scripts/ "https://raw.githubusercontent.com/NVIDIA/NeMo/v1.9.0/examples/asr/transcribe_speech.py"
+$ python scripts/speech_to_text_eval.py \
+    pretrained_name="dominguesm/stt_pt_quartznet15x5_ctc_small" \
+    dataset_manifest="manifests/pt/commonvoice_test_manifest_processed.json" \
+    output_filename="./evaluation_transcripts.json" \
+    batch_size=32 \
+    amp=true \
+    use_cer=false
+```
 ## Limitations
 Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
+## Citation
+If you use our work, please cite:
+```cite
+@misc{domingues2022quartznet15x15-small-portuguese,
+  title={Fine-tuned {Quartznet}-15x5 CTC small model for speech recognition in {P}ortuguese},
+  author={Domingues, Maicon},
+  howpublished={\url{https://huggingface.co/dominguesm/stt_pt_quartznet15x5_ctc_small}},
+  year={2022}
+}
+```
 ## References