File size: 1,584 Bytes
067ad5b d4f4590 8657a3e 47222d2 6155c4f 8657a3e ab49101 43b8501 10e968a ab49101 ebb8ff9 ab49101 ebb8ff9 38bb1e5 b369553 982e531 b369553 38bb1e5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
license: cc-by-nc-4.0
language:
- bn
library_name: nemo
pipeline_tag: automatic-speech-recognition
---
## Hishab BN FastConformer
__Hishab BN FastConformer__ is a [fastconformer](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#fast-conformer) based model trained on ~18K Hours [MegaBNSpeech]() corpus.
## Using method
This model can be used for transcribing Bangla audio and also can be used as pre-trained model to fine-tuning on custom datasets using [NeMo](https://github.com/NVIDIA/NeMo) framework.
### Installation
To install [NeMo](https://github.com/NVIDIA/NeMo) check NeMo documentation.
### Inferencing
```py
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/hishab_bn_fastconformer")
transcriptions = asr_model.transcribe(["file.wav"])
```
## Training Datasets
| Channels Category | Hours |
| ----------------- | ----------- |
| News | 17,640.00 |
| Talkshow | 688.82 |
| Vlog | 0.02 |
| Crime Show | 4.08 |
| Total | 18,332.92 |
## Training Details
For training the model, the dataset we selected comprises 17.64k hours of news chan- nel content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows.
## Evaluation
data:image/s3,"s3://crabby-images/1b651/1b651348458f3c900a383f6df17f51c87bc8983c" alt="image/png"
data:image/s3,"s3://crabby-images/90e03/90e03de36e8791e334d0b06f05236062c32c4615" alt="image/png"
## Citation
|