Spaces:

utkarsh-dixit
/

WhisperFusion

Paused

App Files Files Community

jpc commited on Jan 18

Commit

3ec0fd4

•

1 Parent(s): 19da603

Convert the rest of the README examples into scripts

Browse files

Files changed (13) hide show

README.qmd +16 -58
docker/Dockerfile +8 -0
docker/base-image/Dockerfile +13 -0
docker/base-image/install-deps.sh +54 -0
docker/base-image/install-trt-llm.sh +14 -0
docker/build.sh +14 -0
docker/publish.sh +4 -0
docker/scripts/build-mistral.sh +18 -0
docker/scripts/build-models.sh +7 -0
docker/scripts/build-phi-2.sh +25 -0
setup/setup-tensorrt-llm.sh → docker/scripts/build-whisper.sh +6 -3
docker/scripts/run-whisperbot.sh +16 -0
docker/scripts/setup-whisperbot.sh +32 -0

README.qmd CHANGED Viewed

@@ -45,74 +45,33 @@ Instead of building a docker image, we can also refer to the README and the [Doc
 ### Build Whisper TensorRT Engine
 ```{python}
-include_file('setup/setup-tensorrt-llm.sh')
 ```
 ### Build Mistral TensorRT Engine
-- Change working dir to [llama example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama) in TensorRT-LLM folder.
-```bash
-cd TensorRT-LLM/examples/llama
-```
-- Convert Mistral to `fp16` TensorRT engine.
-```bash
-python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
-                --dtype float16 \
-                --remove_input_padding \
-                --use_gpt_attention_plugin float16 \
-                --enable_context_fmha \
-                --use_gemm_plugin float16 \
-                --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
-                --max_input_len 5000
-                --max_batch_size 1
 ```
 ### Build Phi TensorRT Engine
-Note: Phi is only available in main branch and hasnt been released yet. So, make sure to build TensorRT-LLM from main branch.
-- Change working dir to [phi example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/phi) in TensorRT-LLM folder.
-```bash
-cd TensorRT-LLM/examples/phi
-```
-- Build phi TensorRT engine
-```bash
-git lfs install
-git clone https://huggingface.co/microsoft/phi-2
-python3 build.py --dtype=float16                    \
-                 --log_level=verbose                \
-                 --use_gpt_attention_plugin float16 \
-                 --use_gemm_plugin float16          \
-                 --max_batch_size=16                \
-                 --max_input_len=1024               \
-                 --max_output_len=1024              \
-                 --output_dir=phi_engine            \
-                 --model_dir=phi-2>&1 | tee build.log
-```
-## Run WhisperBot
-- Clone this repo and install requirements.
-```bash
-git clone https://github.com/collabora/WhisperBot.git
-cd WhisperBot
-apt update
-apt install ffmpeg portaudio19-dev -y
-pip install -r requirements.txt
 ```
-### Whisper + Mistral
-- Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Mistral TensorRT from the build phase. If a huggingface model is used to build mistral then just use the huggingface repo name as the tokenizer path.
-```bash
-python3 main.py --mistral
-                --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
-                --mistral_tensorrt_path /root/TensorRT-LLM/examples/llama/tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
-                --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
 ```
-### Whisper + Phi
-- Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Phi TensorRT from the build phase. If a huggingface model is used to build phi then just use the huggingface repo name as the tokenizer path.
-```bash
-python3 main.py --phi
-                --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
-                --phi_tensorrt_path /root/TensorRT-LLM/examples/phi/phi_engine \
-                --phi_tokenizer_path /root/TensorRT-LLM/examples/phi/phi-2
 ```
 - On the client side clone the repo, install the requirements and execute `run_client.py`
@@ -122,7 +81,6 @@ pip install -r requirements.txt
 python3 run_client.py
 ```
 ## Contact Us
 For questions or issues, please open an issue.
 Contact us at: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com

 ### Build Whisper TensorRT Engine
 ```{python}
+include_file('docker/scripts/setup-whisper.sh')
 ```
 ### Build Mistral TensorRT Engine
+```{python}
+include_file('docker/scripts/setup-mistral.sh')
 ```
 ### Build Phi TensorRT Engine
+```{python}
+include_file('docker/scripts/setup-phi-2.sh')
 ```
+## Build WhisperBot
+```{python}
+include_file('docker/scripts/setup-whisperbot.sh')
 ```
+### Run WhisperBot with Whisper and Mistral/Phi-2
+Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Mistral/Phi-2 TensorRT from the build phase. If a huggingface model is used to build mistral/phi-2 then just use the huggingface repo name as the tokenizer path.
+```{python}
+include_file('docker/scripts/run-whisperbot.sh')
 ```
 - On the client side clone the repo, install the requirements and execute `run_client.py`
 python3 run_client.py
 ```
 ## Contact Us
 For questions or issues, please open an issue.
 Contact us at: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com

docker/Dockerfile ADDED Viewed

	@@ -0,0 +1,8 @@

+FROM ghcr.io/collabora/whisperbot-base:latest as base
+WORKDIR /root
+COPY scripts/setup-whisperbot.sh scripts/run-whisperbot.sh scratch-space/models /root/
+RUN ./setup-whisperbot.sh
+CMD ./run-whisperbot.sh

docker/base-image/Dockerfile ADDED Viewed

	@@ -0,0 +1,13 @@

+#ARG BASE_IMAGE=nvcr.io/nvidia/pytorch
+#ARG BASE_TAG=23.10-py3
+ARG BASE_IMAGE=nvcr.io/nvidia/cuda
+ARG BASE_TAG=12.2.2-devel-ubuntu22.04
+FROM ${BASE_IMAGE}:${BASE_TAG} as base
+WORKDIR /root
+COPY install-deps.sh /root
+RUN bash install-deps.sh && rm install-deps.sh
+COPY install-trt-llm.sh /root
+RUN bash install-trt-llm.sh && rm install-trt-llm.sh

docker/base-image/install-deps.sh ADDED Viewed

	@@ -0,0 +1,54 @@

+#!/bin/bash -e
+apt-get update && apt-get -y install git git-lfs
+git clone --depth=1 -b cuda12.2 https://github.com/makaveli10/TensorRT-LLM.git
+cd TensorRT-LLM
+git checkout main
+git submodule update --init --recursive
+git lfs install
+git lfs pull
+# do not reinstall CUDA (our base image provides the same exact versions)
+patch -p1 <<EOF
+diff --git a/docker/common/install_tensorrt.sh b/docker/common/install_tensorrt.sh
+index 2dcb0a6..3a27e03 100644
+--- a/docker/common/install_tensorrt.sh
++++ b/docker/common/install_tensorrt.sh
+@@ -35,19 +35,7 @@ install_ubuntu_requirements() {
+     dpkg -i cuda-keyring_1.0-1_all.deb
+     apt-get update
+-    if [[ $(apt list --installed | grep libcudnn8) ]]; then
+-      apt-get remove --purge -y libcudnn8*
+-    fi
+-    if [[ $(apt list --installed | grep libnccl) ]]; then
+-      apt-get remove --purge -y --allow-change-held-packages libnccl*
+-    fi
+-    if [[ $(apt list --installed | grep libcublas) ]]; then
+-      apt-get remove --purge -y --allow-change-held-packages libcublas*
+-    fi
+-    CUBLAS_CUDA_VERSION=$(echo $CUDA_VER | sed 's/\./-/g')
+     apt-get install -y --no-install-recommends libcudnn8=${CUDNN_VER} libcudnn8-dev=${CUDNN_VER}
+-    apt-get install -y --no-install-recommends libnccl2=${NCCL_VER} libnccl-dev=${NCCL_VER}
+-    apt-get install -y --no-install-recommends libcublas-${CUBLAS_CUDA_VERSION}=${CUBLAS_VER} libcublas-dev-${CUBLAS_CUDA_VERSION}=${CUBLAS_VER}
+     apt-get clean
+     rm -rf /var/lib/apt/lists/*
+ }
+EOF
+cd docker/common/
+export BASH_ENV=${BASH_ENV:-/etc/bash.bashrc}
+export ENV=${ENV:-/etc/shinit_v2}
+bash install_base.sh
+bash install_cmake.sh
+source $ENV
+bash install_ccache.sh
+# later on TensorRT-LLM will force reinstall this version anyways
+pip3 install --extra-index-url https://download.pytorch.org/whl/cu121 torch
+bash install_tensorrt.sh
+bash install_polygraphy.sh
+source $ENV
+cd /root/TensorRT-LLM/docker/common/
+bash install_mpi4py.sh
+source $ENV

docker/base-image/install-trt-llm.sh ADDED Viewed

	@@ -0,0 +1,14 @@

+#!/bin/bash -e
+export ENV=${ENV:-/etc/shinit_v2}
+source $ENV
+cd /root/TensorRT-LLM
+python3 scripts/build_wheel.py --clean --cuda_architectures "89-real;90-real" --trt_root /usr/local/tensorrt
+pip install build/tensorrt_llm-0.7.1-cp310-cp310-linux_x86_64.whl
+mv examples ../TensorRT-LLM-examples
+cd ..
+rm -rf TensorRT-LLM
+# we don't need static libraries and they take a lot of space
+(cd /usr && find . -name '*static.a' | grep -v cudart_static | xargs rm -f)

docker/build.sh ADDED Viewed

	@@ -0,0 +1,14 @@

+#!/bin/bash -e
+[ -n "$VERBOSE" ] && ARGS="--progress plain"
+(
+  cd base-image &&
+  docker build $ARGS -t ghcr.io/collabora/whisperbot-base:latest .
+)
+mkdir -p scratch-space
+cp -r scripts/build-* scratch-space
+#docker run --gpus all --shm-size 64G -v "$PWD"/scratch-space:/root/scratch-space -w /root/scratch-space -it ghcr.io/collabora/whisperbot-base:latest ./build-models.sh
+docker build $ARGS -t ghcr.io/collabora/whisperbot:latest .

docker/publish.sh ADDED Viewed

	@@ -0,0 +1,4 @@

+#!/bin/bash -e
+docker push ghcr.io/collabora/whisperbot-base:latest
+docker push ghcr.io/collabora/whisperbot:latest

docker/scripts/build-mistral.sh ADDED Viewed

	@@ -0,0 +1,18 @@

+#!/bin/bash -e
+cd /root/TensorRT-LLM-examples/llama
+## Build TensorRT for Mistral with `fp16`
+python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
+                --dtype float16 \
+                --remove_input_padding \
+                --use_gpt_attention_plugin float16 \
+                --enable_context_fmha \
+                --use_gemm_plugin float16 \
+                --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
+                --max_input_len 5000 \
+                --max_batch_size 1
+mkdir -p /root/scratch-space/models
+cp -r tmp/mistral/7B/trt_engines/fp16/1-gpu /root/scratch-space/models/mistral

docker/scripts/build-models.sh ADDED Viewed

	@@ -0,0 +1,7 @@

+#!/bin/bash -e
+test -f /etc/shinit_v2 && source /etc/shinit_v2
+./build-whisper.sh
+# ./build-mistral.sh
+./build-phi-2.sh

docker/scripts/build-phi-2.sh ADDED Viewed

	@@ -0,0 +1,25 @@

+#!/bin/bash -e
+## Note: Phi is only available in main branch and hasnt been released yet. So, make sure to build TensorRT-LLM from main branch.
+cd /root/TensorRT-LLM-examples/phi
+## Build TensorRT for Phi-2 with `fp16`
+git lfs install
+phi_path=$(huggingface-cli download --repo-type model --revision 834565c23f9b28b96ccbeabe614dd906b6db551a microsoft/phi-2)
+python3 build.py --dtype=float16                    \
+                 --log_level=verbose                \
+                 --use_gpt_attention_plugin float16 \
+                 --use_gemm_plugin float16          \
+                 --max_batch_size=16                \
+                 --max_input_len=1024               \
+                 --max_output_len=1024              \
+                 --output_dir=phi-2            \
+                 --model_dir="$phi_path" >&1 | tee build.log
+dest=/root/scratch-space/models
+mkdir -p "$dest/phi-2/tokenizer"
+cp -r phi-2 "$dest"
+(cd "$phi_path" && cp config.json tokenizer_config.json vocab.json merges.txt "$dest/phi-2/tokenizer")
+cp -r "$phi_path" "$dest/phi-orig-model"

setup/setup-tensorrt-llm.sh → docker/scripts/build-whisper.sh RENAMED Viewed

@@ -1,13 +1,13 @@
-#!/bin/bash
 ## Change working dir to the [whisper example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper) in TensorRT-LLM.
-cd TensorRT-LLM/examples/whisper
 ## Currently, by default TensorRT-LLM only supports `large-v2` and `large-v3`. In this repo, we use `small.en`.
 ## Download the required assets
 # the sound filter definitions
-wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
 # the small.en model weights
 wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
@@ -28,3 +28,6 @@ EOF
 ## Finally we can build the TensorRT engine for the `small.en` Whisper model:
 pip install -r requirements.txt
 python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin  --use_bert_attention_plugin --model_name small.en

+#!/bin/bash -e
 ## Change working dir to the [whisper example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper) in TensorRT-LLM.
+cd /root/TensorRT-LLM-examples/whisper
 ## Currently, by default TensorRT-LLM only supports `large-v2` and `large-v3`. In this repo, we use `small.en`.
 ## Download the required assets
 # the sound filter definitions
+wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
 # the small.en model weights
 wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
 ## Finally we can build the TensorRT engine for the `small.en` Whisper model:
 pip install -r requirements.txt
 python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin  --use_bert_attention_plugin --model_name small.en
+mkdir -p /root/scratch-space/models
+cp -r whisper_small_en /root/scratch-space/models

docker/scripts/run-whisperbot.sh ADDED Viewed

	@@ -0,0 +1,16 @@

+#!/bin/bash -e
+test -f /etc/shinit_v2 && source /etc/shinit_v2
+cd WhisperBot
+if [ "$1" != "mistral" ]; then
+  exec python3 main.py --phi \
+                  --whisper_tensorrt_path /root/whisper_small_en \
+                  --phi_tensorrt_path /root/phi-2 \
+                  --phi_tokenizer_path /root/phi-2
+else
+  exec python3 main.py --mistral \
+                  --whisper_tensorrt_path /root/models/whisper_small_en \
+                  --mistral_tensorrt_path /root/models/mistral \
+                  --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
+fi

docker/scripts/setup-whisperbot.sh ADDED Viewed

	@@ -0,0 +1,32 @@

+#!/bin/bash -e
+## Clone this repo and install requirements
+[ -d "WhisperBot" ] || git clone https://github.com/collabora/WhisperBot.git
+cd WhisperBot
+apt update
+apt install ffmpeg portaudio19-dev -y
+## NVidia containers are based on unreleased PyTorch versions so we have to manually install
+## torchaudio from source (`pip install torchaudio` would pull all new PyTorch and CUDA versions)
+#apt install -y cmake
+#TORCH_CUDA_ARCH_LIST="8.9 9.0" pip install --no-build-isolation git+https://github.com/pytorch/audio.git
+## Install all the other dependencies normally
+pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
+pip install -r requirements.txt
+pip install openai-whisper whisperspeech soundfile
+## force update huggingface_hub (tokenizers 0.14.1 spuriously require and ancient <=0.18 version)
+pip install -U huggingface_hub
+huggingface-cli download collabora/whisperspeech t2s-small-en+pl.model s2a-q4-tiny-en+pl.model
+huggingface-cli download charactr/vocos-encodec-24khz
+mkdir -p /root/.cache/torch/hub/checkpoints/
+curl -L -o /root/.cache/torch/hub/checkpoints/encodec_24khz-d7cc33bc.th https://dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th
+mkdir -p /root/.cache/whisper-live/
+curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
+python -c 'from transformers.utils.hub import move_cache; move_cache()'