File size: 4,639 Bytes
067ad5b d4f4590 6793265 2bb1ec0 6793265 8657a3e 87cb1dc 8657a3e 30fe112 ab49101 43b8501 10e968a 6793265 ab49101 d0be2d1 ab49101 d0be2d1 ab49101 87cb1dc ab49101 d0be2d1 ab49101 1bb676c ab49101 22b22bd ab49101 22b22bd 38bb1e5 b369553 982e531 b369553 38bb1e5 5d3b81e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
---
license: cc-by-nc-4.0
language:
- bn
library_name: nemo
pipeline_tag: automatic-speech-recognition
tags:
- ASR
- Automatic Speech Recognition
- Bangla ASR
- Bengali ASR
- bn asr
- Bangla fastconformer
- https://arxiv.org/abs/2311.03196
---
## Summary
__titu_stt_bn_fastconformer__ is a [fastconformer](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#fast-conformer) based model trained on ~18K Hours [MegaBNSpeech]() corpus.
Details on paper: [https://aclanthology.org/2023.banglalp-1.16/](https://aclanthology.org/2023.banglalp-1.16/)
## Using method
This model can be used for transcribing Bangla audio and also can be used as pre-trained model to fine-tuning on custom datasets using [NeMo](https://github.com/NVIDIA/NeMo) framework.
### Installation
To install [NeMo](https://github.com/NVIDIA/NeMo) check NeMo documentation.
```
pip install -q 'nemo_toolkit[asr]'
```
### Inferencing
[Download test_bn_fastconformer.wav](https://huggingface.co/hishab/hishab_bn_fastconformer/blob/main/test_bn_fastconformer.wav)
```py
# pip install -q 'nemo_toolkit[asr]'
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")
auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
# ['আজ সরকারি ছুটির দিন দেশের সব শিক্ষা প্রতিষ্ঠান সহ সরকারি আধা সরকারি স্বায়ত্তশাসিত প্রতিষ্ঠান ও ভবনে জাতীয় পতাকা অর্ধনমিত ও কালো পতাকা উত্তোলন করা হয়েছে']
```
Colab Notebook for Infer: [Bangla FastConformer Infer.ipynb](https://colab.research.google.com/drive/1J3bxXlLBgSf1zOKVKbRYu1VrbEJFLlUc?usp=sharing)
## Training Datasets
| Channels Category | Hours |
| ----------------- | ----------- |
| News | 17,640.00 |
| Talkshow | 688.82 |
| Vlog | 0.02 |
| Crime Show | 4.08 |
| Total | 18,332.92 |
## Training Details
For training the model, the dataset we selected comprises 17.64k hours of news chan- nel content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows.
## Evaluation
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64df9253cccd823564c3303b/WvMlp95z2-GXT6AYfwW8Y.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64df9253cccd823564c3303b/O2RA9TAedIv1OTqgdIap5.png)
## Citation
```
@inproceedings{nandi-etal-2023-pseudo,
title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
author = "Nandi, Rabindra Nath and
Menon, Mehadi and
Muntasir, Tareq and
Sarker, Sagor and
Muhtaseem, Quazi Sarwar and
Islam, Md. Tariqul and
Chowdhury, Shammur and
Alam, Firoj",
editor = "Alam, Firoj and
Kar, Sudipta and
Chowdhury, Shammur Absar and
Sadeque, Farig and
Amin, Ruhul",
booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.banglalp-1.16",
doi = "10.18653/v1/2023.banglalp-1.16",
pages = "152--162",
abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
}
``` |