nvidia
/

stt_es_fastconformer_hybrid_large_pc_nc

Automatic Speech Recognition

Model card Files Files and versions Community

Nune1 commited on Jan 7

Commit

57634be

·

verified ·

1 Parent(s): c30b6bb

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -134,7 +134,7 @@ The model was trained on around 3400 hours of Spanish speech data.
 The performance of Automatic Speech Recognition models is measuring using Character Error Rate (CER) and Word Error Rate (WER).
 Table 1 summarizes the performance of the model with the Transducer and CTC decoders across different datasets.
- | Model     | MCV %WER/CER test |MLS %WER/CER test | Voxpopuli %WER/CER test |Fisher %WER/CER test |
  |-----------|--------------|---------------|--------------|---------------|
  | RNNT head | 7.58/ 1.96 | 12.43 / 2.99 |9.59 / 3.67 | 30.76 / 11.49  |
  | CTC head  | 8.23 / 2.20 | 12.63 / 3.11 | 9.93 / 3.79 | 31.20 / 11.44  |
@@ -142,7 +142,7 @@ Table 1 summarizes the performance of the model with the Transducer and CTC deco
 Table 2 provides the performance of the model when punctuation marks are separated during evaluation, using both the Transducer and CTC decoders.
- | Model     | MCV %WER/CER test |MLS %WER/CER test | Voxpopuli %WER/CER test |Fisher %WER/CER test |
  |-----------|--------------|---------------|--------------|---------------|
  | RNNT head | 6.79 / 2.16 | 11.63/ 3.96  |8.84/ 4.06| 27.88 / 13.40 |
  | CTC head  | 7.39 / 2.39 | 11.81 / 4.01  | 9.17 / 4.17| 27.81 / 13.14 |

 The performance of Automatic Speech Recognition models is measuring using Character Error Rate (CER) and Word Error Rate (WER).
 Table 1 summarizes the performance of the model with the Transducer and CTC decoders across different datasets.
+ | Model     | MCV %WER/CER |MLS %WER/CER | Voxpopuli %WER/CER |Fisher %WER/CER|
  |-----------|--------------|---------------|--------------|---------------|
  | RNNT head | 7.58/ 1.96 | 12.43 / 2.99 |9.59 / 3.67 | 30.76 / 11.49  |
  | CTC head  | 8.23 / 2.20 | 12.63 / 3.11 | 9.93 / 3.79 | 31.20 / 11.44  |
 Table 2 provides the performance of the model when punctuation marks are separated during evaluation, using both the Transducer and CTC decoders.
+ | Model     | MCV %WER/CER|MLS %WER/CER| Voxpopuli %WER/CER|Fisher %WER/CER|
  |-----------|--------------|---------------|--------------|---------------|
  | RNNT head | 6.79 / 2.16 | 11.63/ 3.96  |8.84/ 4.06| 27.88 / 13.40 |
  | CTC head  | 7.39 / 2.39 | 11.81 / 4.01  | 9.17 / 4.17| 27.81 / 13.14 |