dominguesm
commited on
Commit
•
14fecbf
1
Parent(s):
31b09b5
Update README.md
Browse files
README.md
CHANGED
@@ -93,8 +93,6 @@ QuartzNet models take in audio segments and transcribe them to letter, byte pair
|
|
93 |
|
94 |
All training scripts will be available at: [DominguesM/stt_pt_quartznet15x5_ctc_small](https://github.com/DominguesM/stt_pt_quartznet15x5_ctc_small)
|
95 |
|
96 |
-
**Soon more information**
|
97 |
-
|
98 |
|
99 |
### Datasets
|
100 |
|
@@ -104,12 +102,45 @@ The model was trained with a part of the Common Voices 9.0 dataset in Portuguese
|
|
104 |
|
105 |
## Performance
|
106 |
|
107 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
|
109 |
## Limitations
|
110 |
|
111 |
Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
|
112 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
113 |
|
114 |
## References
|
115 |
|
|
|
93 |
|
94 |
All training scripts will be available at: [DominguesM/stt_pt_quartznet15x5_ctc_small](https://github.com/DominguesM/stt_pt_quartznet15x5_ctc_small)
|
95 |
|
|
|
|
|
96 |
|
97 |
### Datasets
|
98 |
|
|
|
102 |
|
103 |
## Performance
|
104 |
|
105 |
+
| Metric | Score |
|
106 |
+
| ------- | ----- |
|
107 |
+
| WER | 49% |
|
108 |
+
| CER | 18% |
|
109 |
+
|
110 |
+
The metrics were obtained using the following code:
|
111 |
+
|
112 |
+
**Attention**: The steps below must be performed after downloading the dataset (Mozilla Commom Voices 9.0 PT) and following the steps of pre-processing the audio data and `manifest` files contained in the file [`notebooks/Finetuning CTC model Portuguese.ipynb`](https://github.com/DominguesM/stt_pt_quartznet15x5_ctc_small)
|
113 |
+
|
114 |
+
```bash
|
115 |
+
$ wget -P scripts/ "https://raw.githubusercontent.com/NVIDIA/NeMo/v1.9.0/examples/asr/speech_to_text_eval.py"
|
116 |
+
|
117 |
+
$ wget -P scripts/ "https://raw.githubusercontent.com/NVIDIA/NeMo/v1.9.0/examples/asr/transcribe_speech.py"
|
118 |
+
|
119 |
+
$ python scripts/speech_to_text_eval.py \
|
120 |
+
pretrained_name="dominguesm/stt_pt_quartznet15x5_ctc_small" \
|
121 |
+
dataset_manifest="manifests/pt/commonvoice_test_manifest_processed.json" \
|
122 |
+
output_filename="./evaluation_transcripts.json" \
|
123 |
+
batch_size=32 \
|
124 |
+
amp=true \
|
125 |
+
use_cer=false
|
126 |
+
```
|
127 |
|
128 |
## Limitations
|
129 |
|
130 |
Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
|
131 |
|
132 |
+
## Citation
|
133 |
+
|
134 |
+
If you use our work, please cite:
|
135 |
+
|
136 |
+
```cite
|
137 |
+
@misc{domingues2022quartznet15x15-small-portuguese,
|
138 |
+
title={Fine-tuned {Quartznet}-15x5 CTC small model for speech recognition in {P}ortuguese},
|
139 |
+
author={Domingues, Maicon},
|
140 |
+
howpublished={\url{https://huggingface.co/dominguesm/stt_pt_quartznet15x5_ctc_small}},
|
141 |
+
year={2022}
|
142 |
+
}
|
143 |
+
```
|
144 |
|
145 |
## References
|
146 |
|