Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -258,7 +258,7 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
258
  ## Explainability
259
 
260
  - High-Level Application and Domain: Automatic Speech Recognition
261
- - Describe how this model works: Model transcribes audio input into text for the Armenian language
262
  - Verified to have met prescribed quality standards: Yes
263
  - Performance Metrics: Word Error Rate (WER), Character Error Rate (CER), Real-Time Factor
264
  - Potential Known Risks: Transcripts may not be 100% accurate. Accuracy varies based on the characteristics of input audio (Domain, Use Case, Accent, Noise, Speech Type, Context of speech, etcetera).
@@ -267,19 +267,19 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
267
 
268
  **Test Hardware:** A6000 GPU
269
 
270
- The performance of Automatic Speech Recognition models is measuring using Word Error Rate (WER) and Char Error Rate (CER).
271
- Since this dataset is trained on multiple domains it will generally perform good at transcribing audio in general.
272
 
273
- The following tables summarizes the performance of the available models in this collection with the Transducer decoder.
274
  Performances of the ASR models are reported in terms of Word Error Rate (WER%) and Inverse Real-Time Factor (RTFx) with greedy decoding on test sets.
275
 
276
  - Transducer
277
- |**Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
278
  |----------|-------------|-------------------|----------------|----------------|----------------|----------------|
279
  | 2.0.0 | SentencePiece Unigram | 1024 | 9.90| 1535.45 | 12.32 | 1144.34 |
280
 
281
  - CTC
282
- |**Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
283
  |----------|-------------|-------------------|----------------|----------------|----------------|----------------|
284
  | 2.0.0 | SentencePiece Unigram | 1024 | 11.19 | 1891.04 | 13.23 | 1565.59 |
285
 
@@ -310,7 +310,7 @@ These are greedy WER numbers without external LM. More details on evaluation can
310
  - Non-streaming ASR model
311
  - Model outputs text in Armenian
312
  - Output text requires Inverse Text Normalization
313
- - Model is noise sensitive
314
  - Model is not applicable for life-critical applications.
315
 
316
  ### Access Reactions:
@@ -319,7 +319,7 @@ The Principle of Least Privilege (PoLP) is applied limiting access for dataset g
319
 
320
  ## NVIDIA Riva: Deployment
321
 
322
- [NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
323
 
324
  Additionally, Riva provides:
325
 
 
258
  ## Explainability
259
 
260
  - High-Level Application and Domain: Automatic Speech Recognition
261
+ - Describe how this model works: The model transcribes audio input into text for the Armenian language
262
  - Verified to have met prescribed quality standards: Yes
263
  - Performance Metrics: Word Error Rate (WER), Character Error Rate (CER), Real-Time Factor
264
  - Potential Known Risks: Transcripts may not be 100% accurate. Accuracy varies based on the characteristics of input audio (Domain, Use Case, Accent, Noise, Speech Type, Context of speech, etcetera).
 
267
 
268
  **Test Hardware:** A6000 GPU
269
 
270
+ The performance of Automatic Speech Recognition models is measured using Word Error Rate (WER) and Char Error Rate (CER).
271
+ Since this dataset is trained on multiple domains, it will generally perform well at transcribing audio in general.
272
 
273
+ The following tables summarize the performance of the available models in this collection with the Transducer decoder.
274
  Performances of the ASR models are reported in terms of Word Error Rate (WER%) and Inverse Real-Time Factor (RTFx) with greedy decoding on test sets.
275
 
276
  - Transducer
277
+ |**NeMo Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
278
  |----------|-------------|-------------------|----------------|----------------|----------------|----------------|
279
  | 2.0.0 | SentencePiece Unigram | 1024 | 9.90| 1535.45 | 12.32 | 1144.34 |
280
 
281
  - CTC
282
+ |**NeMo Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
283
  |----------|-------------|-------------------|----------------|----------------|----------------|----------------|
284
  | 2.0.0 | SentencePiece Unigram | 1024 | 11.19 | 1891.04 | 13.23 | 1565.59 |
285
 
 
310
  - Non-streaming ASR model
311
  - Model outputs text in Armenian
312
  - Output text requires Inverse Text Normalization
313
+ - Model is noise-sensitive
314
  - Model is not applicable for life-critical applications.
315
 
316
  ### Access Reactions:
 
319
 
320
  ## NVIDIA Riva: Deployment
321
 
322
+ [NVIDIA Riva](https://developer.nvidia.com/riva) is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
323
 
324
  Additionally, Riva provides:
325