jpc commited on
Commit
3ec0fd4
β€’
1 Parent(s): 19da603

Convert the rest of the README examples into scripts

Browse files
README.qmd CHANGED
@@ -45,74 +45,33 @@ Instead of building a docker image, we can also refer to the README and the [Doc
45
  ### Build Whisper TensorRT Engine
46
 
47
  ```{python}
48
- include_file('setup/setup-tensorrt-llm.sh')
49
  ```
50
 
51
  ### Build Mistral TensorRT Engine
52
- - Change working dir to [llama example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama) in TensorRT-LLM folder.
53
- ```bash
54
- cd TensorRT-LLM/examples/llama
55
- ```
56
- - Convert Mistral to `fp16` TensorRT engine.
57
- ```bash
58
- python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
59
- --dtype float16 \
60
- --remove_input_padding \
61
- --use_gpt_attention_plugin float16 \
62
- --enable_context_fmha \
63
- --use_gemm_plugin float16 \
64
- --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
65
- --max_input_len 5000
66
- --max_batch_size 1
67
  ```
68
 
69
  ### Build Phi TensorRT Engine
70
- Note: Phi is only available in main branch and hasnt been released yet. So, make sure to build TensorRT-LLM from main branch.
71
- - Change working dir to [phi example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/phi) in TensorRT-LLM folder.
72
- ```bash
73
- cd TensorRT-LLM/examples/phi
74
- ```
75
- - Build phi TensorRT engine
76
- ```bash
77
- git lfs install
78
- git clone https://huggingface.co/microsoft/phi-2
79
- python3 build.py --dtype=float16 \
80
- --log_level=verbose \
81
- --use_gpt_attention_plugin float16 \
82
- --use_gemm_plugin float16 \
83
- --max_batch_size=16 \
84
- --max_input_len=1024 \
85
- --max_output_len=1024 \
86
- --output_dir=phi_engine \
87
- --model_dir=phi-2>&1 | tee build.log
88
- ```
89
 
90
- ## Run WhisperBot
91
- - Clone this repo and install requirements.
92
- ```bash
93
- git clone https://github.com/collabora/WhisperBot.git
94
- cd WhisperBot
95
- apt update
96
- apt install ffmpeg portaudio19-dev -y
97
- pip install -r requirements.txt
98
  ```
99
 
100
- ### Whisper + Mistral
101
- - Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Mistral TensorRT from the build phase. If a huggingface model is used to build mistral then just use the huggingface repo name as the tokenizer path.
102
- ```bash
103
- python3 main.py --mistral
104
- --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
105
- --mistral_tensorrt_path /root/TensorRT-LLM/examples/llama/tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
106
- --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
107
  ```
108
 
109
- ### Whisper + Phi
110
- - Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Phi TensorRT from the build phase. If a huggingface model is used to build phi then just use the huggingface repo name as the tokenizer path.
111
- ```bash
112
- python3 main.py --phi
113
- --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
114
- --phi_tensorrt_path /root/TensorRT-LLM/examples/phi/phi_engine \
115
- --phi_tokenizer_path /root/TensorRT-LLM/examples/phi/phi-2
116
  ```
117
 
118
  - On the client side clone the repo, install the requirements and execute `run_client.py`
@@ -122,7 +81,6 @@ pip install -r requirements.txt
122
  python3 run_client.py
123
  ```
124
 
125
-
126
  ## Contact Us
127
  For questions or issues, please open an issue.
128
  Contact us at: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com
 
45
  ### Build Whisper TensorRT Engine
46
 
47
  ```{python}
48
+ include_file('docker/scripts/setup-whisper.sh')
49
  ```
50
 
51
  ### Build Mistral TensorRT Engine
52
+
53
+ ```{python}
54
+ include_file('docker/scripts/setup-mistral.sh')
 
 
 
 
 
 
 
 
 
 
 
 
55
  ```
56
 
57
  ### Build Phi TensorRT Engine
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
+ ```{python}
60
+ include_file('docker/scripts/setup-phi-2.sh')
 
 
 
 
 
 
61
  ```
62
 
63
+ ## Build WhisperBot
64
+
65
+ ```{python}
66
+ include_file('docker/scripts/setup-whisperbot.sh')
 
 
 
67
  ```
68
 
69
+ ### Run WhisperBot with Whisper and Mistral/Phi-2
70
+
71
+ Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Mistral/Phi-2 TensorRT from the build phase. If a huggingface model is used to build mistral/phi-2 then just use the huggingface repo name as the tokenizer path.
72
+
73
+ ```{python}
74
+ include_file('docker/scripts/run-whisperbot.sh')
 
75
  ```
76
 
77
  - On the client side clone the repo, install the requirements and execute `run_client.py`
 
81
  python3 run_client.py
82
  ```
83
 
 
84
  ## Contact Us
85
  For questions or issues, please open an issue.
86
  Contact us at: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com
docker/Dockerfile ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ FROM ghcr.io/collabora/whisperbot-base:latest as base
2
+
3
+ WORKDIR /root
4
+ COPY scripts/setup-whisperbot.sh scripts/run-whisperbot.sh scratch-space/models /root/
5
+ RUN ./setup-whisperbot.sh
6
+
7
+ CMD ./run-whisperbot.sh
8
+
docker/base-image/Dockerfile ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #ARG BASE_IMAGE=nvcr.io/nvidia/pytorch
2
+ #ARG BASE_TAG=23.10-py3
3
+ ARG BASE_IMAGE=nvcr.io/nvidia/cuda
4
+ ARG BASE_TAG=12.2.2-devel-ubuntu22.04
5
+
6
+ FROM ${BASE_IMAGE}:${BASE_TAG} as base
7
+
8
+ WORKDIR /root
9
+ COPY install-deps.sh /root
10
+ RUN bash install-deps.sh && rm install-deps.sh
11
+
12
+ COPY install-trt-llm.sh /root
13
+ RUN bash install-trt-llm.sh && rm install-trt-llm.sh
docker/base-image/install-deps.sh ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ apt-get update && apt-get -y install git git-lfs
4
+ git clone --depth=1 -b cuda12.2 https://github.com/makaveli10/TensorRT-LLM.git
5
+ cd TensorRT-LLM
6
+ git checkout main
7
+ git submodule update --init --recursive
8
+ git lfs install
9
+ git lfs pull
10
+
11
+ # do not reinstall CUDA (our base image provides the same exact versions)
12
+ patch -p1 <<EOF
13
+ diff --git a/docker/common/install_tensorrt.sh b/docker/common/install_tensorrt.sh
14
+ index 2dcb0a6..3a27e03 100644
15
+ --- a/docker/common/install_tensorrt.sh
16
+ +++ b/docker/common/install_tensorrt.sh
17
+ @@ -35,19 +35,7 @@ install_ubuntu_requirements() {
18
+ dpkg -i cuda-keyring_1.0-1_all.deb
19
+
20
+ apt-get update
21
+ - if [[ $(apt list --installed | grep libcudnn8) ]]; then
22
+ - apt-get remove --purge -y libcudnn8*
23
+ - fi
24
+ - if [[ $(apt list --installed | grep libnccl) ]]; then
25
+ - apt-get remove --purge -y --allow-change-held-packages libnccl*
26
+ - fi
27
+ - if [[ $(apt list --installed | grep libcublas) ]]; then
28
+ - apt-get remove --purge -y --allow-change-held-packages libcublas*
29
+ - fi
30
+ - CUBLAS_CUDA_VERSION=$(echo $CUDA_VER | sed 's/\./-/g')
31
+ apt-get install -y --no-install-recommends libcudnn8=${CUDNN_VER} libcudnn8-dev=${CUDNN_VER}
32
+ - apt-get install -y --no-install-recommends libnccl2=${NCCL_VER} libnccl-dev=${NCCL_VER}
33
+ - apt-get install -y --no-install-recommends libcublas-${CUBLAS_CUDA_VERSION}=${CUBLAS_VER} libcublas-dev-${CUBLAS_CUDA_VERSION}=${CUBLAS_VER}
34
+ apt-get clean
35
+ rm -rf /var/lib/apt/lists/*
36
+ }
37
+ EOF
38
+
39
+ cd docker/common/
40
+ export BASH_ENV=${BASH_ENV:-/etc/bash.bashrc}
41
+ export ENV=${ENV:-/etc/shinit_v2}
42
+ bash install_base.sh
43
+ bash install_cmake.sh
44
+ source $ENV
45
+ bash install_ccache.sh
46
+ # later on TensorRT-LLM will force reinstall this version anyways
47
+ pip3 install --extra-index-url https://download.pytorch.org/whl/cu121 torch
48
+ bash install_tensorrt.sh
49
+ bash install_polygraphy.sh
50
+ source $ENV
51
+
52
+ cd /root/TensorRT-LLM/docker/common/
53
+ bash install_mpi4py.sh
54
+ source $ENV
docker/base-image/install-trt-llm.sh ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ export ENV=${ENV:-/etc/shinit_v2}
4
+ source $ENV
5
+
6
+ cd /root/TensorRT-LLM
7
+ python3 scripts/build_wheel.py --clean --cuda_architectures "89-real;90-real" --trt_root /usr/local/tensorrt
8
+ pip install build/tensorrt_llm-0.7.1-cp310-cp310-linux_x86_64.whl
9
+ mv examples ../TensorRT-LLM-examples
10
+ cd ..
11
+
12
+ rm -rf TensorRT-LLM
13
+ # we don't need static libraries and they take a lot of space
14
+ (cd /usr && find . -name '*static.a' | grep -v cudart_static | xargs rm -f)
docker/build.sh ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ [ -n "$VERBOSE" ] && ARGS="--progress plain"
4
+
5
+ (
6
+ cd base-image &&
7
+ docker build $ARGS -t ghcr.io/collabora/whisperbot-base:latest .
8
+ )
9
+
10
+ mkdir -p scratch-space
11
+ cp -r scripts/build-* scratch-space
12
+ #docker run --gpus all --shm-size 64G -v "$PWD"/scratch-space:/root/scratch-space -w /root/scratch-space -it ghcr.io/collabora/whisperbot-base:latest ./build-models.sh
13
+
14
+ docker build $ARGS -t ghcr.io/collabora/whisperbot:latest .
docker/publish.sh ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ docker push ghcr.io/collabora/whisperbot-base:latest
4
+ docker push ghcr.io/collabora/whisperbot:latest
docker/scripts/build-mistral.sh ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ cd /root/TensorRT-LLM-examples/llama
4
+
5
+ ## Build TensorRT for Mistral with `fp16`
6
+
7
+ python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
8
+ --dtype float16 \
9
+ --remove_input_padding \
10
+ --use_gpt_attention_plugin float16 \
11
+ --enable_context_fmha \
12
+ --use_gemm_plugin float16 \
13
+ --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
14
+ --max_input_len 5000 \
15
+ --max_batch_size 1
16
+
17
+ mkdir -p /root/scratch-space/models
18
+ cp -r tmp/mistral/7B/trt_engines/fp16/1-gpu /root/scratch-space/models/mistral
docker/scripts/build-models.sh ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ test -f /etc/shinit_v2 && source /etc/shinit_v2
4
+
5
+ ./build-whisper.sh
6
+ # ./build-mistral.sh
7
+ ./build-phi-2.sh
docker/scripts/build-phi-2.sh ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ ## Note: Phi is only available in main branch and hasnt been released yet. So, make sure to build TensorRT-LLM from main branch.
4
+
5
+ cd /root/TensorRT-LLM-examples/phi
6
+
7
+ ## Build TensorRT for Phi-2 with `fp16`
8
+
9
+ git lfs install
10
+ phi_path=$(huggingface-cli download --repo-type model --revision 834565c23f9b28b96ccbeabe614dd906b6db551a microsoft/phi-2)
11
+ python3 build.py --dtype=float16 \
12
+ --log_level=verbose \
13
+ --use_gpt_attention_plugin float16 \
14
+ --use_gemm_plugin float16 \
15
+ --max_batch_size=16 \
16
+ --max_input_len=1024 \
17
+ --max_output_len=1024 \
18
+ --output_dir=phi-2 \
19
+ --model_dir="$phi_path" >&1 | tee build.log
20
+
21
+ dest=/root/scratch-space/models
22
+ mkdir -p "$dest/phi-2/tokenizer"
23
+ cp -r phi-2 "$dest"
24
+ (cd "$phi_path" && cp config.json tokenizer_config.json vocab.json merges.txt "$dest/phi-2/tokenizer")
25
+ cp -r "$phi_path" "$dest/phi-orig-model"
setup/setup-tensorrt-llm.sh β†’ docker/scripts/build-whisper.sh RENAMED
@@ -1,13 +1,13 @@
1
- #!/bin/bash
2
 
3
  ## Change working dir to the [whisper example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper) in TensorRT-LLM.
4
- cd TensorRT-LLM/examples/whisper
5
 
6
  ## Currently, by default TensorRT-LLM only supports `large-v2` and `large-v3`. In this repo, we use `small.en`.
7
  ## Download the required assets
8
 
9
  # the sound filter definitions
10
- wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
11
  # the small.en model weights
12
  wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
13
 
@@ -28,3 +28,6 @@ EOF
28
  ## Finally we can build the TensorRT engine for the `small.en` Whisper model:
29
  pip install -r requirements.txt
30
  python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin --model_name small.en
 
 
 
 
1
+ #!/bin/bash -e
2
 
3
  ## Change working dir to the [whisper example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper) in TensorRT-LLM.
4
+ cd /root/TensorRT-LLM-examples/whisper
5
 
6
  ## Currently, by default TensorRT-LLM only supports `large-v2` and `large-v3`. In this repo, we use `small.en`.
7
  ## Download the required assets
8
 
9
  # the sound filter definitions
10
+ wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
11
  # the small.en model weights
12
  wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
13
 
 
28
  ## Finally we can build the TensorRT engine for the `small.en` Whisper model:
29
  pip install -r requirements.txt
30
  python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin --model_name small.en
31
+
32
+ mkdir -p /root/scratch-space/models
33
+ cp -r whisper_small_en /root/scratch-space/models
docker/scripts/run-whisperbot.sh ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ test -f /etc/shinit_v2 && source /etc/shinit_v2
4
+
5
+ cd WhisperBot
6
+ if [ "$1" != "mistral" ]; then
7
+ exec python3 main.py --phi \
8
+ --whisper_tensorrt_path /root/whisper_small_en \
9
+ --phi_tensorrt_path /root/phi-2 \
10
+ --phi_tokenizer_path /root/phi-2
11
+ else
12
+ exec python3 main.py --mistral \
13
+ --whisper_tensorrt_path /root/models/whisper_small_en \
14
+ --mistral_tensorrt_path /root/models/mistral \
15
+ --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
16
+ fi
docker/scripts/setup-whisperbot.sh ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ ## Clone this repo and install requirements
4
+ [ -d "WhisperBot" ] || git clone https://github.com/collabora/WhisperBot.git
5
+
6
+ cd WhisperBot
7
+ apt update
8
+ apt install ffmpeg portaudio19-dev -y
9
+
10
+ ## NVidia containers are based on unreleased PyTorch versions so we have to manually install
11
+ ## torchaudio from source (`pip install torchaudio` would pull all new PyTorch and CUDA versions)
12
+ #apt install -y cmake
13
+ #TORCH_CUDA_ARCH_LIST="8.9 9.0" pip install --no-build-isolation git+https://github.com/pytorch/audio.git
14
+
15
+ ## Install all the other dependencies normally
16
+ pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
17
+ pip install -r requirements.txt
18
+ pip install openai-whisper whisperspeech soundfile
19
+
20
+ ## force update huggingface_hub (tokenizers 0.14.1 spuriously require and ancient <=0.18 version)
21
+ pip install -U huggingface_hub
22
+
23
+ huggingface-cli download collabora/whisperspeech t2s-small-en+pl.model s2a-q4-tiny-en+pl.model
24
+ huggingface-cli download charactr/vocos-encodec-24khz
25
+
26
+ mkdir -p /root/.cache/torch/hub/checkpoints/
27
+ curl -L -o /root/.cache/torch/hub/checkpoints/encodec_24khz-d7cc33bc.th https://dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th
28
+ mkdir -p /root/.cache/whisper-live/
29
+ curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
30
+
31
+ python -c 'from transformers.utils.hub import move_cache; move_cache()'
32
+