--- license: cc-by-nc-4.0 language: - bn library_name: nemo pipeline_tag: automatic-speech-recognition --- ## Hishab BN FastConformer __Hishab BN FastConformer__ is a [fastconformer](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#fast-conformer) based model trained on ~18K Hours [MegaBNSpeech]() corpus. ## Using method This model can be used for transcribing Bangla audio and also can be used as pre-trained model to fine-tuning on custom datasets using [NeMo](https://github.com/NVIDIA/NeMo) framework. ### Installation To install [NeMo](https://github.com/NVIDIA/NeMo) check NeMo documentation. ### Inferencing ```py import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/hishab_bn_fastconformer") transcriptions = asr_model.transcribe(["file.wav"]) ``` ## Training Datasets | Channels Category | Hours | | ----------------- | ----------- | | News | 17,640.00 | | Talkshow | 688.82 | | Vlog | 0.02 | | Crime Show | 4.08 | | Total | 18,332.92 | ## Training Details For training the model, the dataset we selected comprises 17.64k hours of news chan- nel content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows. ## Evaluation ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64df9253cccd823564c3303b/WvMlp95z2-GXT6AYfwW8Y.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64df9253cccd823564c3303b/O2RA9TAedIv1OTqgdIap5.png) ## Citation