Spaces:

utkarsh-dixit
/

WhisperFusion

Paused

App Files Files Community

makaveli10 commited on Jan 26

Commit

fc01fde

•

1 Parent(s): 25adfbc

update README

Browse files

Files changed (1) hide show

README.md +28 -206

README.md CHANGED Viewed

@@ -10,10 +10,10 @@ Welcome to WhisperFusion. WhisperFusion builds upon the capabilities of
 the [WhisperLive](https://github.com/collabora/WhisperLive) and
 [WhisperSpeech](https://github.com/collabora/WhisperSpeech) by
 integrating Mistral, a Large Language Model (LLM), on top of the
-real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper,
-a powerful automatic speech recognition (ASR) system. Both Mistral and
 Whisper are optimized to run efficiently as TensorRT engines, maximizing
-performance and real-time processing capabilities.
 ## Features
@@ -24,211 +24,33 @@ performance and real-time processing capabilities.
   Model, to enhance the understanding and context of the transcribed
   text.
-- **TensorRT Optimization**: Both Mistral and Whisper are optimized to
   run as TensorRT engines, ensuring high-performance and low-latency
   processing.
-## Prerequisites
-Install
-[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation.md)
-to build Whisper and Mistral TensorRT engines. The README builds a
-docker image for TensorRT-LLM. Instead of building a docker image, we
-can also refer to the README and the
-[Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi)
-to install the required packages in the base pytroch docker image. Just
-make sure to use the correct base image as mentioned in the dockerfile
-and everything should go nice and smooth.
-### Build Whisper TensorRT Engine
-> [!NOTE]
->
-> These steps are included in `docker/scripts/build-whisper.sh`
-Change working dir to the [whisper example
-dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper)
-in TensorRT-LLM.
-``` bash
-cd /root/TensorRT-LLM-examples/whisper
-```
-Currently, by default TensorRT-LLM only supports `large-v2` and
-`large-v3`. In this repo, we use `small.en`.
-Download the required assets
-``` bash
-# the sound filter definitions
-wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
-# the small.en model weights
-wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
-```
-We have to patch the script to add support for out model size
-(`small.en`):
-``` bash
-patch <<EOF
---- build.py.old  2024-01-17 17:47:47.508545842 +0100
-+++ build.py  2024-01-17 17:47:41.404941926 +0100
-@@ -58,6 +58,7 @@
-                         choices=[
-                             "large-v3",
-                             "large-v2",
-+                            "small.en",
-                         ])
-     parser.add_argument('--quantize_dir', type=str, default="quantize/1-gpu")
-     parser.add_argument('--dtype',
-EOF
-```
-Finally we can build the TensorRT engine for the `small.en` Whisper
-model:
-``` bash
-pip install -r requirements.txt
-python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin  --use_bert_attention_plugin --model_name small.en
-mkdir -p /root/scratch-space/models
-cp -r whisper_small_en /root/scratch-space/models
-```
-### Build Mistral TensorRT Engine
-> [!NOTE]
->
-> These steps are included in `docker/scripts/build-mistral.sh`
-``` bash
-cd /root/TensorRT-LLM-examples/llama
-```
-Build TensorRT for Mistral with `fp16`
-``` bash
-python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
-                --dtype float16 \
-                --remove_input_padding \
-                --use_gpt_attention_plugin float16 \
-                --enable_context_fmha \
-                --use_gemm_plugin float16 \
-                --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
-                --max_input_len 5000 \
-                --max_batch_size 1
-mkdir -p /root/scratch-space/models
-cp -r tmp/mistral/7B/trt_engines/fp16/1-gpu /root/scratch-space/models/mistral
-```
-### Build Phi TensorRT Engine
-> [!NOTE]
->
-> These steps are included in `docker/scripts/build-phi-2.sh`
-Note: Phi is only available in main branch and hasnt been released yet.
-So, make sure to build TensorRT-LLM from main branch.
-``` bash
-cd /root/TensorRT-LLM-examples/phi
-```
-Build TensorRT for Phi-2 with `fp16`
-``` bash
-git lfs install
-phi_path=$(huggingface-cli download --repo-type model --revision 834565c23f9b28b96ccbeabe614dd906b6db551a microsoft/phi-2)
-python3 build.py --dtype=float16                    \
-                 --log_level=verbose                \
-                 --use_gpt_attention_plugin float16 \
-                 --use_gemm_plugin float16          \
-                 --max_batch_size=16                \
-                 --max_input_len=1024               \
-                 --max_output_len=1024              \
-                 --output_dir=phi-2            \
-                 --model_dir="$phi_path" >&1 | tee build.log
-dest=/root/scratch-space/models
-mkdir -p "$dest/phi-2/tokenizer"
-cp -r phi-2 "$dest"
-(cd "$phi_path" && cp config.json tokenizer_config.json vocab.json merges.txt "$dest/phi-2/tokenizer")
-cp -r "$phi_path" "$dest/phi-orig-model"
-```
-## Build WhisperFusion
-> [!NOTE]
->
-> These steps are included in `docker/scripts/setup-whisperfusion.sh`
-Clone this repo and install requirements
-``` bash
-[ -d "WhisperFusion" ] || git clone https://github.com/collabora/WhisperFusion.git
-cd WhisperFusion
-apt update
-apt install ffmpeg portaudio19-dev -y
-```
-Install torchaudio matching the PyTorch from the base image
-``` bash
-pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
-```
-Install all the other dependencies normally
-``` bash
-pip install -r requirements.txt
-```
-force update huggingface_hub (tokenizers 0.14.1 spuriously require and
-ancient \<=0.18 version)
-``` bash
-pip install -U huggingface_hub
-huggingface-cli download collabora/whisperspeech t2s-small-en+pl.model s2a-q4-tiny-en+pl.model
-huggingface-cli download charactr/vocos-encodec-24khz
-mkdir -p /root/.cache/torch/hub/checkpoints/
-curl -L -o /root/.cache/torch/hub/checkpoints/encodec_24khz-d7cc33bc.th https://dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th
-mkdir -p /root/.cache/whisper-live/
-curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
-python -c 'from transformers.utils.hub import move_cache; move_cache()'
-```
-### Run WhisperFusion with Whisper and Mistral/Phi-2
-Take the folder path for Whisper TensorRT model, folder_path and
-tokenizer_path for Mistral/Phi-2 TensorRT from the build phase. If a
-huggingface model is used to build mistral/phi-2 then just use the
-huggingface repo name as the tokenizer path.
-> [!NOTE]
->
-> These steps are included in `docker/scripts/run-whisperfusion.sh`
-``` bash
-test -f /etc/shinit_v2 && source /etc/shinit_v2
-cd WhisperFusion
-if [ "$1" != "mistral" ]; then
-  exec python3 main.py --phi \
-                  --whisper_tensorrt_path /root/whisper_small_en \
-                  --phi_tensorrt_path /root/phi-2 \
-                  --phi_tokenizer_path /root/phi-2
-else
-  exec python3 main.py --mistral \
-                  --whisper_tensorrt_path /root/models/whisper_small_en \
-                  --mistral_tensorrt_path /root/models/mistral \
-                  --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
-fi
-```
-- On the client side clone the repo, install the requirements and
-  execute `run_client.py`
-``` bash
-cd WhisperFusion
-pip install -r requirements.txt
-python3 run_client.py
 ```
 ## Contact Us

 the [WhisperLive](https://github.com/collabora/WhisperLive) and
 [WhisperSpeech](https://github.com/collabora/WhisperSpeech) by
 integrating Mistral, a Large Language Model (LLM), on top of the
+real-time speech-to-text pipeline. Both LLM and
 Whisper are optimized to run efficiently as TensorRT engines, maximizing
+performance and real-time processing capabilities. While WhiperSpeech is
+optimized with torch.compile.
 ## Features
   Model, to enhance the understanding and context of the transcribed
   text.
+- **TensorRT Optimization**: Both LLM and Whisper are optimized to
   run as TensorRT engines, ensuring high-performance and low-latency
   processing.
+- **torch.compile**: WhisperSpeech uses torch.compile to speed up
+  inference which makes PyTorch code run faster by JIT-compiling PyTorch
+  code into optimized kernels.
+## Getting Started
+- We provide a pre-built TensorRT-LLM docker container that has both whisper and
+  phi converted to TensorRT engines and WhisperSpeech model is pre-downloaded to
+  quickly start interacting with WhisperFusion.
+```bash
+ docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion:latest
+```
+- Start Web GUI
+```bash
+ cd examples/chatbot/html
+ python -m http.server
+```
+## Build Docker Image
+- We provide the docker image for cuda-architecures 89 and 90. If you have a GPU
+  with a different cuda architecture. For e.g. to build for RTX 3090 with cuda-
+  architecture 86
+```bash
+ bash build.sh 86-real
 ```
 ## Contact Us