metadata

title: Automatic speech recognition
sdk: gradio
app_file: src/app.py
python_version: 3.1
sdk_version: 4.21.0
app_port: 7860
tags:
  - asr
  - stt
  - speech-to-text
  - whisper
  - pyannote
  - diarization
pinned: true
emoji: ✍️

Automatic speech recognition

Automatic speech recognition uses Whisper to transcribe audio files and pyannote-audio to add speaker diarization.

It has optimized inference because of batching and Scale-Product-Attention (SDPA) or flash attention (if available).

:warning: Always review transcriptions. Transcriptions are done using AI models which are never 100% accurate.

The repo contains (will contain) code to run the software

as a command-line tool
as graphical interface
as an inference API

Installation

Prerequisites

The host machine must have an NVidia graphics card with CUDA 12.x installed natively, preferably CUDA 12.1, even when using Docker.

The graphics card should have at least 12GB VRAM for the largest model.

The host machine must have Docker installed.

For a Linux server, follow these instructions

For a desktop (visual UI available), follow these instructions

Docker (recommended)

Build the Docker image

docker build -t asr . (make sure Docker is running on your system)

Run the Docker image, forward port 7860 (Gradio) and pass your GPU(s) to the container

docker run -p 7860:7860 --gpus all asr

Or in detached mode (in background)

docker run -d -p 7860:7860 --gpus all asr

You can check whether it is running with

docker ps

If you want to follow terminal output of a detached container, you can use

docker logs -f <first n digits of the container id>

The first time a transcription is requested, it will download the model. To avoid this happening each time, make sure you stop and start the same container, instead of using

docker run ... again

use docker start <first n digits of container>

You can find the list of all containers, also stopped ones by using

docker ps -a

To open the app, open your browser and go to localhost:7860

Dev Container

Open the project Visual Studio Code and use CTRL + SHIFT + P and type "Rebuild and reopen in container".

After building, open up a terminal and activate the virtual environment

source /home/jovyan/venv/bin/activate

Then run the app

python src/app.py

License

GNU General Public License v3.0 or later

See COPYING to see the full text.