imedennikov commited on
Commit
dd1e3ba
1 Parent(s): 2928bdb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -121,7 +121,7 @@ pip install nemo_toolkit['asr']
121
 
122
  ## How to Use this Model
123
 
124
- The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
125
 
126
  ### Automatically instantiate the model
127
 
@@ -164,7 +164,7 @@ TDT (Token-and-Duration Transducer) [2] is a generalization of conventional Tran
164
 
165
  ## Training
166
 
167
- The NeMo toolkit [3] was used for training this model with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/fastconformer/hybrid_transducer_ctc/fastconformer_hybrid_transducer_ctc_bpe.yaml).
168
 
169
  The model was trained for 300k steps with dynamic bucketing and a batch duration of 600s per GPU on 32 NVIDIA A100 80GB GPUs, and then finetuned for 100k additional steps on the modified training data (predicted texts for training samples with CER>10%).
170
 
@@ -204,7 +204,7 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
204
 
205
  [2] [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795)
206
 
207
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
208
 
209
  [4] [Google SentencePiece Tokenizer](https://github.com/google/sentencepiece)
210
 
 
121
 
122
  ## How to Use this Model
123
 
124
+ The model is available for use in the NeMo Framework [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
125
 
126
  ### Automatically instantiate the model
127
 
 
164
 
165
  ## Training
166
 
167
+ The NeMo Framework [3] was used for training this model with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/fastconformer/hybrid_transducer_ctc/fastconformer_hybrid_transducer_ctc_bpe.yaml).
168
 
169
  The model was trained for 300k steps with dynamic bucketing and a batch duration of 600s per GPU on 32 NVIDIA A100 80GB GPUs, and then finetuned for 100k additional steps on the modified training data (predicted texts for training samples with CER>10%).
170
 
 
204
 
205
  [2] [Efficient Sequence Transduction by Jointly Predicting Tokens and Durations](https://arxiv.org/abs/2304.06795)
206
 
207
+ [3] [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo)
208
 
209
  [4] [Google SentencePiece Tokenizer](https://github.com/google/sentencepiece)
210