Automatic Speech Recognition
Malayalam
ctranslate2
audio
vegam
kurianbenoy's picture
Update README.md
894f34d
---
language:
- ml
tags:
- audio
- automatic-speech-recognition
- vegam
license: mit
datasets:
- google/fleurs
- thennal/IMaSC
- mozilla-foundation/common_voice_11_0
library_name: ctranslate2
---
# vegam-whipser-medium-ml-int8 (വേഗം)
> This just support `int8` quantization only.
> Note: Model file size is 785 MB.
This is a conversion of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format.
This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper).
## Installation
- Install [faster-whisper](https://github.com/guillaumekln/faster-whisper). More details about installation can be [found here in faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master#installation).
```
pip install faster-whisper
```
- Install [git-lfs](https://git-lfs.com/) for using this project. Note that git-lfs is just for downloading model from hugging-face.
```
apt-get install git-lfs
```
- Download the model weights
```
git lfs install
git clone https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml-int8
```
## Usage
```
from faster_whisper import WhisperModel
model_path = "vegam-whisper-medium-ml-int8"
model = WhisperModel(model_path, device="cpu", compute_type="int8")
segments, info = model.transcribe("audio.mp3", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```
## Example
```
from faster_whisper import WhisperModel
model_path = "vegam-whisper-medium-ml-int8"
model = WhisperModel(model_path, device="cpu", compute_type="int8")
segments, info = model.transcribe("00b38e80-80b8-4f70-babf-566e848879fc.webm", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
```
> Detected language 'ta' with probability 0.353516
> [0.00s -> 4.74s] പാലം കടുക്കുവോളം നാരായണ പാലം കടന്നാലൊ കൂരായണ
Note: The audio file [00b38e80-80b8-4f70-babf-566e848879fc.webm](https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml/blob/main/00b38e80-80b8-4f70-babf-566e848879fc.webm) is from [Malayalam Speech Corpus](https://blog.smc.org.in/malayalam-speech-corpus/) and is stored along with model weights.
## Conversion Details
This conversion was possible with wonderful [CTranslate2 library](https://github.com/OpenNMT/CTranslate2) leveraging the [Transformers converter for OpenAI Whisper](https://opennmt.net/CTranslate2/guides/transformers.html#whisper).The original model was converted with the following command:
```
ct2-transformers-converter --model thennal/whisper-medium-ml --output_dir vegam-whisper-medium-ml-int8 \
--quantization int8
```
## Many Thanks to
- Creators of CTranslate2 and faster-whisper
- Thennal D K
- Santhosh Thottingal