IndicWhisper With JAX (more faster)

IndicWhisper is a state-of-the-art speech recognition model fine-tuned on Indian languages. This repository contains the code for training and evaluating the model, as well as pre-trained checkpoints for immediate use.

Overview

IndicWhisper achieves impressive Word Error Rates (WERs) on various benchmarks for Indian languages. It outperforms other publicly available models, making it a valuable asset for speech recognition tasks in Indian languages.

Performance on Vistaar Benchmark (Hindi Subset)

Model	Kathbath	Kathbath-Hard	FLEURS	CommonVoice	IndicTTS	MUCS	Gramvaani	Average
Google STT	14.3	16.7	19.4	20.8	18.3	17.8	59.9	23.9
IndicWav2vec	12.2	16.2	18.3	20.2	15	22.9	42.1	21
Azure STT	13.6	15.1	24.3	14.6	15.2	15.1	42.3	20
Nvidia-medium	14	15.6	19.4	20.4	12.3	12.4	41.3	19.4
Nvidia-large	12.7	14.2	15.7	21.2	12.2	11.8	42.6	18.6
IndicWhisper	10.3	12.0	11.4	15.0	7.6	12	26.8	13.6

Usage

New Feature: JAX Mode

We have recently added support for JAX mode, which significantly enhances performance on both TPUs and GPUs. This feature is particularly useful for high-performance computing environments and is optimized for speed and efficiency.

This repository provides an optimized JAX model for the Indic Whisper Model, built upon the foundation of the 🤗 Indic Whisper implementation by AI4 Bharat. The JAX implementation significantly enhances performance, running over 70x compared to the original Indic Whisper PyTorch code. This makes it the fastest Whisper implementation available.

from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline
import jax.numpy as jnp

pipeline = FlaxWhisperPipline('parthiv11/indic_whisper_hi_multi_gpu', dtype=jnp.bfloat16)
transcript= pipeline('sample.mp3')

Acknowledgements

We would like to express our gratitude to the following organizations for their support:

EkStep Foundation for their generous grant, which facilitated the establishment of the Centre for AI4Bharat at IIT Madras.
The Ministry of Electronics and Information Technology (NLTM) for its grant to support the creation of datasets and models for Indian languages under the Bhashini project.
The Centre for Development of Advanced Computing, India (C-DAC), for providing access to the Param Siddhi supercomputer for training our models.
Microsoft for its grant to create datasets, tools, and resources for Indian languages.
For JAX guide on github

License

IndicWhisper and the associated Vistaar benchmark are MIT-licensed. This license applies to all the fine-tuned language models included in this repository.

Contributors

Kaushal Bhogale (AI4Bharat)
Sai Narayan Sundaresan (IITKGP, AI4Bharat)
Abhigyan Raman (AI4Bharat)
Tahir Javed (IITM, AI4Bharat)
Mitesh Khapra (IITM, AI4Bharat, RBCDSAI)
Pratyush Kumar (Microsoft, AI4Bharat)

Contributing

We welcome contributions from the community to further improve IndicWhisper. If you have any ideas, bug fixes, or enhancements, please feel free to submit a pull request.

Thank you for your interest in IndicWhisper! We hope it proves to be a valuable tool for your speech recognition needs in Indian languages.