Whisper-large-v3-vaani-hindi

This is a fine-tuned version of OpenAI's Whisper-Medium, trained on approximately 718 hours of transcribed Hindi speech from multiple datasets.

Usage

This can be used with the pipeline function from the Transformers module.


import torch
from transformers import pipeline

audio = "path to the audio file to be transcribed"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
modelTags="ARTPARK-IISc/whisper-medium-vaani-hindi"
transcribe = pipeline(task="automatic-speech-recognition", model=modelTags, chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="hi", task="transcribe")

print('Transcription: ', transcribe(audio)["text"])

Training and Evaluation

The models has finetuned using folllowing dataset Vaani ,Gramvaani IndicVoices, Fleurs,IndicTTS and Commonvoice

The performance of the model was evaluated using multiple datasets, and the evaluation results are provided below.

Dataset WER
Gramvaani 27.64
Fleurs 14.34
IndicTTS 07.78
MUCS 23.46
Commonvoice 19.90
Kathbath 14.29
Kathbath Noisy 16.03
Vaani 25.48
RESPIN 08.79
Downloads last month
86
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Providers NEW
Inference Providers available for this model are disabled. Settings

Model tree for ARTPARK-IISc/whisper-medium-vaani-hindi

Finetuned
(573)
this model

Dataset used to train ARTPARK-IISc/whisper-medium-vaani-hindi

Collection including ARTPARK-IISc/whisper-medium-vaani-hindi