makaveli10 commited on
Commit
fc01fde
1 Parent(s): 25adfbc

update README

Browse files
Files changed (1) hide show
  1. README.md +28 -206
README.md CHANGED
@@ -10,10 +10,10 @@ Welcome to WhisperFusion. WhisperFusion builds upon the capabilities of
10
  the [WhisperLive](https://github.com/collabora/WhisperLive) and
11
  [WhisperSpeech](https://github.com/collabora/WhisperSpeech) by
12
  integrating Mistral, a Large Language Model (LLM), on top of the
13
- real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper,
14
- a powerful automatic speech recognition (ASR) system. Both Mistral and
15
  Whisper are optimized to run efficiently as TensorRT engines, maximizing
16
- performance and real-time processing capabilities.
 
17
 
18
  ## Features
19
 
@@ -24,211 +24,33 @@ performance and real-time processing capabilities.
24
  Model, to enhance the understanding and context of the transcribed
25
  text.
26
 
27
- - **TensorRT Optimization**: Both Mistral and Whisper are optimized to
28
  run as TensorRT engines, ensuring high-performance and low-latency
29
  processing.
30
-
31
- ## Prerequisites
32
-
33
- Install
34
- [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation.md)
35
- to build Whisper and Mistral TensorRT engines. The README builds a
36
- docker image for TensorRT-LLM. Instead of building a docker image, we
37
- can also refer to the README and the
38
- [Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi)
39
- to install the required packages in the base pytroch docker image. Just
40
- make sure to use the correct base image as mentioned in the dockerfile
41
- and everything should go nice and smooth.
42
-
43
- ### Build Whisper TensorRT Engine
44
-
45
- > [!NOTE]
46
- >
47
- > These steps are included in `docker/scripts/build-whisper.sh`
48
-
49
- Change working dir to the [whisper example
50
- dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper)
51
- in TensorRT-LLM.
52
-
53
- ``` bash
54
- cd /root/TensorRT-LLM-examples/whisper
55
- ```
56
-
57
- Currently, by default TensorRT-LLM only supports `large-v2` and
58
- `large-v3`. In this repo, we use `small.en`.
59
-
60
- Download the required assets
61
-
62
- ``` bash
63
- # the sound filter definitions
64
- wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
65
- # the small.en model weights
66
- wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
67
- ```
68
-
69
- We have to patch the script to add support for out model size
70
- (`small.en`):
71
-
72
- ``` bash
73
- patch <<EOF
74
- --- build.py.old 2024-01-17 17:47:47.508545842 +0100
75
- +++ build.py 2024-01-17 17:47:41.404941926 +0100
76
- @@ -58,6 +58,7 @@
77
- choices=[
78
- "large-v3",
79
- "large-v2",
80
- + "small.en",
81
- ])
82
- parser.add_argument('--quantize_dir', type=str, default="quantize/1-gpu")
83
- parser.add_argument('--dtype',
84
- EOF
85
- ```
86
-
87
- Finally we can build the TensorRT engine for the `small.en` Whisper
88
- model:
89
-
90
- ``` bash
91
- pip install -r requirements.txt
92
- python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin --model_name small.en
93
- mkdir -p /root/scratch-space/models
94
- cp -r whisper_small_en /root/scratch-space/models
95
- ```
96
-
97
- ### Build Mistral TensorRT Engine
98
-
99
- > [!NOTE]
100
- >
101
- > These steps are included in `docker/scripts/build-mistral.sh`
102
-
103
- ``` bash
104
- cd /root/TensorRT-LLM-examples/llama
105
- ```
106
-
107
- Build TensorRT for Mistral with `fp16`
108
-
109
- ``` bash
110
- python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
111
- --dtype float16 \
112
- --remove_input_padding \
113
- --use_gpt_attention_plugin float16 \
114
- --enable_context_fmha \
115
- --use_gemm_plugin float16 \
116
- --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
117
- --max_input_len 5000 \
118
- --max_batch_size 1
119
- mkdir -p /root/scratch-space/models
120
- cp -r tmp/mistral/7B/trt_engines/fp16/1-gpu /root/scratch-space/models/mistral
121
- ```
122
-
123
- ### Build Phi TensorRT Engine
124
-
125
- > [!NOTE]
126
- >
127
- > These steps are included in `docker/scripts/build-phi-2.sh`
128
-
129
- Note: Phi is only available in main branch and hasnt been released yet.
130
- So, make sure to build TensorRT-LLM from main branch.
131
-
132
- ``` bash
133
- cd /root/TensorRT-LLM-examples/phi
134
- ```
135
-
136
- Build TensorRT for Phi-2 with `fp16`
137
-
138
- ``` bash
139
- git lfs install
140
- phi_path=$(huggingface-cli download --repo-type model --revision 834565c23f9b28b96ccbeabe614dd906b6db551a microsoft/phi-2)
141
- python3 build.py --dtype=float16 \
142
- --log_level=verbose \
143
- --use_gpt_attention_plugin float16 \
144
- --use_gemm_plugin float16 \
145
- --max_batch_size=16 \
146
- --max_input_len=1024 \
147
- --max_output_len=1024 \
148
- --output_dir=phi-2 \
149
- --model_dir="$phi_path" >&1 | tee build.log
150
- dest=/root/scratch-space/models
151
- mkdir -p "$dest/phi-2/tokenizer"
152
- cp -r phi-2 "$dest"
153
- (cd "$phi_path" && cp config.json tokenizer_config.json vocab.json merges.txt "$dest/phi-2/tokenizer")
154
- cp -r "$phi_path" "$dest/phi-orig-model"
155
- ```
156
-
157
- ## Build WhisperFusion
158
-
159
- > [!NOTE]
160
- >
161
- > These steps are included in `docker/scripts/setup-whisperfusion.sh`
162
-
163
- Clone this repo and install requirements
164
-
165
- ``` bash
166
- [ -d "WhisperFusion" ] || git clone https://github.com/collabora/WhisperFusion.git
167
- cd WhisperFusion
168
- apt update
169
- apt install ffmpeg portaudio19-dev -y
170
- ```
171
-
172
- Install torchaudio matching the PyTorch from the base image
173
-
174
- ``` bash
175
- pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
176
- ```
177
-
178
- Install all the other dependencies normally
179
-
180
- ``` bash
181
- pip install -r requirements.txt
182
- ```
183
-
184
- force update huggingface_hub (tokenizers 0.14.1 spuriously require and
185
- ancient \<=0.18 version)
186
-
187
- ``` bash
188
- pip install -U huggingface_hub
189
- huggingface-cli download collabora/whisperspeech t2s-small-en+pl.model s2a-q4-tiny-en+pl.model
190
- huggingface-cli download charactr/vocos-encodec-24khz
191
- mkdir -p /root/.cache/torch/hub/checkpoints/
192
- curl -L -o /root/.cache/torch/hub/checkpoints/encodec_24khz-d7cc33bc.th https://dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th
193
- mkdir -p /root/.cache/whisper-live/
194
- curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
195
- python -c 'from transformers.utils.hub import move_cache; move_cache()'
196
- ```
197
-
198
- ### Run WhisperFusion with Whisper and Mistral/Phi-2
199
-
200
- Take the folder path for Whisper TensorRT model, folder_path and
201
- tokenizer_path for Mistral/Phi-2 TensorRT from the build phase. If a
202
- huggingface model is used to build mistral/phi-2 then just use the
203
- huggingface repo name as the tokenizer path.
204
-
205
- > [!NOTE]
206
- >
207
- > These steps are included in `docker/scripts/run-whisperfusion.sh`
208
-
209
- ``` bash
210
- test -f /etc/shinit_v2 && source /etc/shinit_v2
211
- cd WhisperFusion
212
- if [ "$1" != "mistral" ]; then
213
- exec python3 main.py --phi \
214
- --whisper_tensorrt_path /root/whisper_small_en \
215
- --phi_tensorrt_path /root/phi-2 \
216
- --phi_tokenizer_path /root/phi-2
217
- else
218
- exec python3 main.py --mistral \
219
- --whisper_tensorrt_path /root/models/whisper_small_en \
220
- --mistral_tensorrt_path /root/models/mistral \
221
- --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
222
- fi
223
- ```
224
-
225
- - On the client side clone the repo, install the requirements and
226
- execute `run_client.py`
227
-
228
- ``` bash
229
- cd WhisperFusion
230
- pip install -r requirements.txt
231
- python3 run_client.py
232
  ```
233
 
234
  ## Contact Us
 
10
  the [WhisperLive](https://github.com/collabora/WhisperLive) and
11
  [WhisperSpeech](https://github.com/collabora/WhisperSpeech) by
12
  integrating Mistral, a Large Language Model (LLM), on top of the
13
+ real-time speech-to-text pipeline. Both LLM and
 
14
  Whisper are optimized to run efficiently as TensorRT engines, maximizing
15
+ performance and real-time processing capabilities. While WhiperSpeech is
16
+ optimized with torch.compile.
17
 
18
  ## Features
19
 
 
24
  Model, to enhance the understanding and context of the transcribed
25
  text.
26
 
27
+ - **TensorRT Optimization**: Both LLM and Whisper are optimized to
28
  run as TensorRT engines, ensuring high-performance and low-latency
29
  processing.
30
+ - **torch.compile**: WhisperSpeech uses torch.compile to speed up
31
+ inference which makes PyTorch code run faster by JIT-compiling PyTorch
32
+ code into optimized kernels.
33
+
34
+ ## Getting Started
35
+ - We provide a pre-built TensorRT-LLM docker container that has both whisper and
36
+ phi converted to TensorRT engines and WhisperSpeech model is pre-downloaded to
37
+ quickly start interacting with WhisperFusion.
38
+ ```bash
39
+ docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion:latest
40
+ ```
41
+
42
+ - Start Web GUI
43
+ ```bash
44
+ cd examples/chatbot/html
45
+ python -m http.server
46
+ ```
47
+
48
+ ## Build Docker Image
49
+ - We provide the docker image for cuda-architecures 89 and 90. If you have a GPU
50
+ with a different cuda architecture. For e.g. to build for RTX 3090 with cuda-
51
+ architecture 86
52
+ ```bash
53
+ bash build.sh 86-real
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ```
55
 
56
  ## Contact Us