Spaces:

utkarsh-dixit
/

WhisperFusion

Paused

App Files Files Community

jpc commited on Jan 23

Commit

ad3d1f7

•

1 Parent(s): 3ec0fd4

Automate Docker building

Browse files

Files changed (5) hide show

README.md +137 -47
README.qmd +4 -4
docker/scripts/setup-whisperbot.sh +2 -7
docker/scripts/setup.sh +6 -0
requirements.txt +4 -1

README.md CHANGED Viewed

@@ -1,29 +1,52 @@
 # WhisperBot
-Welcome to WhisperBot. WhisperBot builds upon the capabilities of the [WhisperLive](https://github.com/collabora/WhisperLive) and [WhisperSpeech](https://github.com/collabora/WhisperSpeech) by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper, a powerful automatic speech recognition (ASR) system. Both Mistral and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities.
 ## Features
-- **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert spoken language into text in real-time.
-- **Large Language Model Integration**: Adds Mistral, a Large Language Model, to enhance the understanding and context of the transcribed text.
-- **TensorRT Optimization**: Both Mistral and Whisper are optimized to run as TensorRT engines, ensuring high-performance and low-latency processing.
 ## Prerequisites
-Install [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation.md) to build Whisper and Mistral TensorRT engines. The README builds a docker image for TensorRT-LLM.
-Instead of building a docker image, we can also refer to the README and the [Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi) to install the required packages in the base pytroch docker image. Just make sure to use the correct base image as mentioned in the dockerfile and everything should go nice and smooth.
 ### Build Whisper TensorRT Engine
 > [!NOTE]
 >
-> These steps are included in `setup/setup-tensorrt-llm.sh`
 Change working dir to the [whisper example
 dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper)
 in TensorRT-LLM.
 ``` bash
-cd TensorRT-LLM/examples/whisper
 ```
 Currently, by default TensorRT-LLM only supports `large-v2` and
@@ -33,7 +56,7 @@ Download the required assets
 ``` bash
 # the sound filter definitions
-wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
 # the small.en model weights
 wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
 ```
@@ -62,15 +85,23 @@ model:
 ``` bash
 pip install -r requirements.txt
 python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin  --use_bert_attention_plugin --model_name small.en
 ```
 ### Build Mistral TensorRT Engine
-- Change working dir to [llama example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama) in TensorRT-LLM folder.
-```bash
-cd TensorRT-LLM/examples/llama
 ```
-- Convert Mistral to `fp16` TensorRT engine.
-```bash
 python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
                 --dtype float16 \
                 --remove_input_padding \
@@ -78,20 +109,30 @@ python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
                 --enable_context_fmha \
                 --use_gemm_plugin float16 \
                 --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
-                --max_input_len 5000
                 --max_batch_size 1
 ```
 ### Build Phi TensorRT Engine
-Note: Phi is only available in main branch and hasnt been released yet. So, make sure to build TensorRT-LLM from main branch.
-- Change working dir to [phi example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/phi) in TensorRT-LLM folder.
-```bash
-cd TensorRT-LLM/examples/phi
 ```
-- Build phi TensorRT engine
-```bash
 git lfs install
-git clone https://huggingface.co/microsoft/phi-2
 python3 build.py --dtype=float16                    \
                  --log_level=verbose                \
                  --use_gpt_attention_plugin float16 \
@@ -99,46 +140,95 @@ python3 build.py --dtype=float16                    \
                  --max_batch_size=16                \
                  --max_input_len=1024               \
                  --max_output_len=1024              \
-                 --output_dir=phi_engine            \
-                 --model_dir=phi-2>&1 | tee build.log
 ```
-## Run WhisperBot
-- Clone this repo and install requirements.
-```bash
-git clone https://github.com/collabora/WhisperBot.git
 cd WhisperBot
 apt update
 apt install ffmpeg portaudio19-dev -y
 pip install -r requirements.txt
 ```
-### Whisper + Mistral
-- Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Mistral TensorRT from the build phase. If a huggingface model is used to build mistral then just use the huggingface repo name as the tokenizer path.
-```bash
-python3 main.py --mistral
-                --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
-                --mistral_tensorrt_path /root/TensorRT-LLM/examples/llama/tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
-                --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
 ```
-### Whisper + Phi
-- Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Phi TensorRT from the build phase. If a huggingface model is used to build phi then just use the huggingface repo name as the tokenizer path.
-```bash
-python3 main.py --phi
-                --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
-                --phi_tensorrt_path /root/TensorRT-LLM/examples/phi/phi_engine \
-                --phi_tokenizer_path /root/TensorRT-LLM/examples/phi/phi-2
 ```
-- On the client side clone the repo, install the requirements and execute `run_client.py`
-```bash
 cd WhisperBot
 pip install -r requirements.txt
 python3 run_client.py
 ```
 ## Contact Us
-For questions or issues, please open an issue.
-Contact us at: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com

 # WhisperBot
+Welcome to WhisperBot. WhisperBot builds upon the capabilities of the
+[WhisperLive](https://github.com/collabora/WhisperLive) and
+[WhisperSpeech](https://github.com/collabora/WhisperSpeech) by
+integrating Mistral, a Large Language Model (LLM), on top of the
+real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper,
+a powerful automatic speech recognition (ASR) system. Both Mistral and
+Whisper are optimized to run efficiently as TensorRT engines, maximizing
+performance and real-time processing capabilities.
 ## Features
+- **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert
+  spoken language into text in real-time.
+- **Large Language Model Integration**: Adds Mistral, a Large Language
+  Model, to enhance the understanding and context of the transcribed
+  text.
+- **TensorRT Optimization**: Both Mistral and Whisper are optimized to
+  run as TensorRT engines, ensuring high-performance and low-latency
+  processing.
 ## Prerequisites
+Install
+[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation.md)
+to build Whisper and Mistral TensorRT engines. The README builds a
+docker image for TensorRT-LLM. Instead of building a docker image, we
+can also refer to the README and the
+[Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi)
+to install the required packages in the base pytroch docker image. Just
+make sure to use the correct base image as mentioned in the dockerfile
+and everything should go nice and smooth.
 ### Build Whisper TensorRT Engine
 > [!NOTE]
 >
+> These steps are included in `docker/scripts/build-whisper.sh`
 Change working dir to the [whisper example
 dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper)
 in TensorRT-LLM.
 ``` bash
+cd /root/TensorRT-LLM-examples/whisper
 ```
 Currently, by default TensorRT-LLM only supports `large-v2` and
 ``` bash
 # the sound filter definitions
+wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
 # the small.en model weights
 wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
 ```
 ``` bash
 pip install -r requirements.txt
 python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin  --use_bert_attention_plugin --model_name small.en
+mkdir -p /root/scratch-space/models
+cp -r whisper_small_en /root/scratch-space/models
 ```
 ### Build Mistral TensorRT Engine
+> [!NOTE]
+>
+> These steps are included in `docker/scripts/build-mistral.sh`
+``` bash
+cd /root/TensorRT-LLM-examples/llama
 ```
+Build TensorRT for Mistral with `fp16`
+``` bash
 python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
                 --dtype float16 \
                 --remove_input_padding \
                 --enable_context_fmha \
                 --use_gemm_plugin float16 \
                 --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
+                --max_input_len 5000 \
                 --max_batch_size 1
+mkdir -p /root/scratch-space/models
+cp -r tmp/mistral/7B/trt_engines/fp16/1-gpu /root/scratch-space/models/mistral
 ```
 ### Build Phi TensorRT Engine
+> [!NOTE]
+>
+> These steps are included in `docker/scripts/build-phi-2.sh`
+Note: Phi is only available in main branch and hasnt been released yet.
+So, make sure to build TensorRT-LLM from main branch.
+``` bash
+cd /root/TensorRT-LLM-examples/phi
 ```
+Build TensorRT for Phi-2 with `fp16`
+``` bash
 git lfs install
+phi_path=$(huggingface-cli download --repo-type model --revision 834565c23f9b28b96ccbeabe614dd906b6db551a microsoft/phi-2)
 python3 build.py --dtype=float16                    \
                  --log_level=verbose                \
                  --use_gpt_attention_plugin float16 \
                  --max_batch_size=16                \
                  --max_input_len=1024               \
                  --max_output_len=1024              \
+                 --output_dir=phi-2            \
+                 --model_dir="$phi_path" >&1 | tee build.log
+dest=/root/scratch-space/models
+mkdir -p "$dest/phi-2/tokenizer"
+cp -r phi-2 "$dest"
+(cd "$phi_path" && cp config.json tokenizer_config.json vocab.json merges.txt "$dest/phi-2/tokenizer")
+cp -r "$phi_path" "$dest/phi-orig-model"
 ```
+## Build WhisperBot
+> [!NOTE]
+>
+> These steps are included in `docker/scripts/setup-whisperbot.sh`
+Clone this repo and install requirements
+``` bash
+[ -d "WhisperBot" ] || git clone https://github.com/collabora/WhisperBot.git
 cd WhisperBot
 apt update
 apt install ffmpeg portaudio19-dev -y
+```
+Install torchaudio matching the PyTorch from the base image
+``` bash
+pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
+```
+Install all the other dependencies normally
+``` bash
 pip install -r requirements.txt
+pip install openai-whisper whisperspeech soundfile
 ```
+force update huggingface_hub (tokenizers 0.14.1 spuriously require and
+ancient \<=0.18 version)
+``` bash
+pip install -U huggingface_hub
+huggingface-cli download collabora/whisperspeech t2s-small-en+pl.model s2a-q4-tiny-en+pl.model
+huggingface-cli download charactr/vocos-encodec-24khz
+mkdir -p /root/.cache/torch/hub/checkpoints/
+curl -L -o /root/.cache/torch/hub/checkpoints/encodec_24khz-d7cc33bc.th https://dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th
+mkdir -p /root/.cache/whisper-live/
+curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
+python -c 'from transformers.utils.hub import move_cache; move_cache()'
 ```
+### Run WhisperBot with Whisper and Mistral/Phi-2
+Take the folder path for Whisper TensorRT model, folder_path and
+tokenizer_path for Mistral/Phi-2 TensorRT from the build phase. If a
+huggingface model is used to build mistral/phi-2 then just use the
+huggingface repo name as the tokenizer path.
+> [!NOTE]
+>
+> These steps are included in `docker/scripts/run-whisperbot.sh`
+``` bash
+test -f /etc/shinit_v2 && source /etc/shinit_v2
+cd WhisperBot
+if [ "$1" != "mistral" ]; then
+  exec python3 main.py --phi \
+                  --whisper_tensorrt_path /root/whisper_small_en \
+                  --phi_tensorrt_path /root/phi-2 \
+                  --phi_tokenizer_path /root/phi-2
+else
+  exec python3 main.py --mistral \
+                  --whisper_tensorrt_path /root/models/whisper_small_en \
+                  --mistral_tensorrt_path /root/models/mistral \
+                  --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
+fi
 ```
+- On the client side clone the repo, install the requirements and
+  execute `run_client.py`
+``` bash
 cd WhisperBot
 pip install -r requirements.txt
 python3 run_client.py
 ```
 ## Contact Us
+For questions or issues, please open an issue. Contact us at:
+marcus.edel@collabora.com, jpc@collabora.com,
+vineet.suryan@collabora.com

README.qmd CHANGED Viewed

@@ -29,7 +29,7 @@ These steps are included in `{fname}`
 # WhisperBot
-Welcome to WhisperBot. WhisperBot builds upon the capabilities of the [WhisperLive]() by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper, a powerful automatic speech recognition (ASR) system. Both Mistral and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities.
 ## Features
 - **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert spoken language into text in real-time.
@@ -45,19 +45,19 @@ Instead of building a docker image, we can also refer to the README and the [Doc
 ### Build Whisper TensorRT Engine
 ```{python}
-include_file('docker/scripts/setup-whisper.sh')
 ```
 ### Build Mistral TensorRT Engine
 ```{python}
-include_file('docker/scripts/setup-mistral.sh')
 ```
 ### Build Phi TensorRT Engine
 ```{python}
-include_file('docker/scripts/setup-phi-2.sh')
 ```
 ## Build WhisperBot

 # WhisperBot
+Welcome to WhisperBot. WhisperBot builds upon the capabilities of the [WhisperLive](https://github.com/collabora/WhisperLive) and [WhisperSpeech](https://github.com/collabora/WhisperSpeech) by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper, a powerful automatic speech recognition (ASR) system. Both Mistral and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities.
 ## Features
 - **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert spoken language into text in real-time.
 ### Build Whisper TensorRT Engine
 ```{python}
+include_file('docker/scripts/build-whisper.sh')
 ```
 ### Build Mistral TensorRT Engine
 ```{python}
+include_file('docker/scripts/build-mistral.sh')
 ```
 ### Build Phi TensorRT Engine
 ```{python}
+include_file('docker/scripts/build-phi-2.sh')
 ```
 ## Build WhisperBot

docker/scripts/setup-whisperbot.sh CHANGED Viewed

@@ -7,15 +7,11 @@ cd WhisperBot
 apt update
 apt install ffmpeg portaudio19-dev -y
-## NVidia containers are based on unreleased PyTorch versions so we have to manually install
-## torchaudio from source (`pip install torchaudio` would pull all new PyTorch and CUDA versions)
-#apt install -y cmake
-#TORCH_CUDA_ARCH_LIST="8.9 9.0" pip install --no-build-isolation git+https://github.com/pytorch/audio.git
 ## Install all the other dependencies normally
-pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
 pip install -r requirements.txt
-pip install openai-whisper whisperspeech soundfile
 ## force update huggingface_hub (tokenizers 0.14.1 spuriously require and ancient <=0.18 version)
 pip install -U huggingface_hub
@@ -29,4 +25,3 @@ mkdir -p /root/.cache/whisper-live/
 curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
 python -c 'from transformers.utils.hub import move_cache; move_cache()'

 apt update
 apt install ffmpeg portaudio19-dev -y
+## Install torchaudio matching the PyTorch from the base image
+pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
 ## Install all the other dependencies normally
 pip install -r requirements.txt
 ## force update huggingface_hub (tokenizers 0.14.1 spuriously require and ancient <=0.18 version)
 pip install -U huggingface_hub
 curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
 python -c 'from transformers.utils.hub import move_cache; move_cache()'

docker/scripts/setup.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash -e
+./setup-whisper.sh
+#./setup-mistral.sh
+./setup-phi-2.sh
+./setup-whisperbot.sh

requirements.txt CHANGED Viewed

@@ -6,4 +6,7 @@ scipy
 websocket-client
 tiktoken==0.3.3
 kaldialign
-braceexpand

 websocket-client
 tiktoken==0.3.3
 kaldialign
+braceexpand
+openai-whisper
+whisperspeech
+soundfile