Spaces:
Paused
Paused
makaveli10
commited on
Commit
•
fc01fde
1
Parent(s):
25adfbc
update README
Browse files
README.md
CHANGED
@@ -10,10 +10,10 @@ Welcome to WhisperFusion. WhisperFusion builds upon the capabilities of
|
|
10 |
the [WhisperLive](https://github.com/collabora/WhisperLive) and
|
11 |
[WhisperSpeech](https://github.com/collabora/WhisperSpeech) by
|
12 |
integrating Mistral, a Large Language Model (LLM), on top of the
|
13 |
-
real-time speech-to-text pipeline.
|
14 |
-
a powerful automatic speech recognition (ASR) system. Both Mistral and
|
15 |
Whisper are optimized to run efficiently as TensorRT engines, maximizing
|
16 |
-
performance and real-time processing capabilities.
|
|
|
17 |
|
18 |
## Features
|
19 |
|
@@ -24,211 +24,33 @@ performance and real-time processing capabilities.
|
|
24 |
Model, to enhance the understanding and context of the transcribed
|
25 |
text.
|
26 |
|
27 |
-
- **TensorRT Optimization**: Both
|
28 |
run as TensorRT engines, ensuring high-performance and low-latency
|
29 |
processing.
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
cd /root/TensorRT-LLM-examples/whisper
|
55 |
-
```
|
56 |
-
|
57 |
-
Currently, by default TensorRT-LLM only supports `large-v2` and
|
58 |
-
`large-v3`. In this repo, we use `small.en`.
|
59 |
-
|
60 |
-
Download the required assets
|
61 |
-
|
62 |
-
``` bash
|
63 |
-
# the sound filter definitions
|
64 |
-
wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
|
65 |
-
# the small.en model weights
|
66 |
-
wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
|
67 |
-
```
|
68 |
-
|
69 |
-
We have to patch the script to add support for out model size
|
70 |
-
(`small.en`):
|
71 |
-
|
72 |
-
``` bash
|
73 |
-
patch <<EOF
|
74 |
-
--- build.py.old 2024-01-17 17:47:47.508545842 +0100
|
75 |
-
+++ build.py 2024-01-17 17:47:41.404941926 +0100
|
76 |
-
@@ -58,6 +58,7 @@
|
77 |
-
choices=[
|
78 |
-
"large-v3",
|
79 |
-
"large-v2",
|
80 |
-
+ "small.en",
|
81 |
-
])
|
82 |
-
parser.add_argument('--quantize_dir', type=str, default="quantize/1-gpu")
|
83 |
-
parser.add_argument('--dtype',
|
84 |
-
EOF
|
85 |
-
```
|
86 |
-
|
87 |
-
Finally we can build the TensorRT engine for the `small.en` Whisper
|
88 |
-
model:
|
89 |
-
|
90 |
-
``` bash
|
91 |
-
pip install -r requirements.txt
|
92 |
-
python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin --model_name small.en
|
93 |
-
mkdir -p /root/scratch-space/models
|
94 |
-
cp -r whisper_small_en /root/scratch-space/models
|
95 |
-
```
|
96 |
-
|
97 |
-
### Build Mistral TensorRT Engine
|
98 |
-
|
99 |
-
> [!NOTE]
|
100 |
-
>
|
101 |
-
> These steps are included in `docker/scripts/build-mistral.sh`
|
102 |
-
|
103 |
-
``` bash
|
104 |
-
cd /root/TensorRT-LLM-examples/llama
|
105 |
-
```
|
106 |
-
|
107 |
-
Build TensorRT for Mistral with `fp16`
|
108 |
-
|
109 |
-
``` bash
|
110 |
-
python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
|
111 |
-
--dtype float16 \
|
112 |
-
--remove_input_padding \
|
113 |
-
--use_gpt_attention_plugin float16 \
|
114 |
-
--enable_context_fmha \
|
115 |
-
--use_gemm_plugin float16 \
|
116 |
-
--output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
|
117 |
-
--max_input_len 5000 \
|
118 |
-
--max_batch_size 1
|
119 |
-
mkdir -p /root/scratch-space/models
|
120 |
-
cp -r tmp/mistral/7B/trt_engines/fp16/1-gpu /root/scratch-space/models/mistral
|
121 |
-
```
|
122 |
-
|
123 |
-
### Build Phi TensorRT Engine
|
124 |
-
|
125 |
-
> [!NOTE]
|
126 |
-
>
|
127 |
-
> These steps are included in `docker/scripts/build-phi-2.sh`
|
128 |
-
|
129 |
-
Note: Phi is only available in main branch and hasnt been released yet.
|
130 |
-
So, make sure to build TensorRT-LLM from main branch.
|
131 |
-
|
132 |
-
``` bash
|
133 |
-
cd /root/TensorRT-LLM-examples/phi
|
134 |
-
```
|
135 |
-
|
136 |
-
Build TensorRT for Phi-2 with `fp16`
|
137 |
-
|
138 |
-
``` bash
|
139 |
-
git lfs install
|
140 |
-
phi_path=$(huggingface-cli download --repo-type model --revision 834565c23f9b28b96ccbeabe614dd906b6db551a microsoft/phi-2)
|
141 |
-
python3 build.py --dtype=float16 \
|
142 |
-
--log_level=verbose \
|
143 |
-
--use_gpt_attention_plugin float16 \
|
144 |
-
--use_gemm_plugin float16 \
|
145 |
-
--max_batch_size=16 \
|
146 |
-
--max_input_len=1024 \
|
147 |
-
--max_output_len=1024 \
|
148 |
-
--output_dir=phi-2 \
|
149 |
-
--model_dir="$phi_path" >&1 | tee build.log
|
150 |
-
dest=/root/scratch-space/models
|
151 |
-
mkdir -p "$dest/phi-2/tokenizer"
|
152 |
-
cp -r phi-2 "$dest"
|
153 |
-
(cd "$phi_path" && cp config.json tokenizer_config.json vocab.json merges.txt "$dest/phi-2/tokenizer")
|
154 |
-
cp -r "$phi_path" "$dest/phi-orig-model"
|
155 |
-
```
|
156 |
-
|
157 |
-
## Build WhisperFusion
|
158 |
-
|
159 |
-
> [!NOTE]
|
160 |
-
>
|
161 |
-
> These steps are included in `docker/scripts/setup-whisperfusion.sh`
|
162 |
-
|
163 |
-
Clone this repo and install requirements
|
164 |
-
|
165 |
-
``` bash
|
166 |
-
[ -d "WhisperFusion" ] || git clone https://github.com/collabora/WhisperFusion.git
|
167 |
-
cd WhisperFusion
|
168 |
-
apt update
|
169 |
-
apt install ffmpeg portaudio19-dev -y
|
170 |
-
```
|
171 |
-
|
172 |
-
Install torchaudio matching the PyTorch from the base image
|
173 |
-
|
174 |
-
``` bash
|
175 |
-
pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
|
176 |
-
```
|
177 |
-
|
178 |
-
Install all the other dependencies normally
|
179 |
-
|
180 |
-
``` bash
|
181 |
-
pip install -r requirements.txt
|
182 |
-
```
|
183 |
-
|
184 |
-
force update huggingface_hub (tokenizers 0.14.1 spuriously require and
|
185 |
-
ancient \<=0.18 version)
|
186 |
-
|
187 |
-
``` bash
|
188 |
-
pip install -U huggingface_hub
|
189 |
-
huggingface-cli download collabora/whisperspeech t2s-small-en+pl.model s2a-q4-tiny-en+pl.model
|
190 |
-
huggingface-cli download charactr/vocos-encodec-24khz
|
191 |
-
mkdir -p /root/.cache/torch/hub/checkpoints/
|
192 |
-
curl -L -o /root/.cache/torch/hub/checkpoints/encodec_24khz-d7cc33bc.th https://dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th
|
193 |
-
mkdir -p /root/.cache/whisper-live/
|
194 |
-
curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
|
195 |
-
python -c 'from transformers.utils.hub import move_cache; move_cache()'
|
196 |
-
```
|
197 |
-
|
198 |
-
### Run WhisperFusion with Whisper and Mistral/Phi-2
|
199 |
-
|
200 |
-
Take the folder path for Whisper TensorRT model, folder_path and
|
201 |
-
tokenizer_path for Mistral/Phi-2 TensorRT from the build phase. If a
|
202 |
-
huggingface model is used to build mistral/phi-2 then just use the
|
203 |
-
huggingface repo name as the tokenizer path.
|
204 |
-
|
205 |
-
> [!NOTE]
|
206 |
-
>
|
207 |
-
> These steps are included in `docker/scripts/run-whisperfusion.sh`
|
208 |
-
|
209 |
-
``` bash
|
210 |
-
test -f /etc/shinit_v2 && source /etc/shinit_v2
|
211 |
-
cd WhisperFusion
|
212 |
-
if [ "$1" != "mistral" ]; then
|
213 |
-
exec python3 main.py --phi \
|
214 |
-
--whisper_tensorrt_path /root/whisper_small_en \
|
215 |
-
--phi_tensorrt_path /root/phi-2 \
|
216 |
-
--phi_tokenizer_path /root/phi-2
|
217 |
-
else
|
218 |
-
exec python3 main.py --mistral \
|
219 |
-
--whisper_tensorrt_path /root/models/whisper_small_en \
|
220 |
-
--mistral_tensorrt_path /root/models/mistral \
|
221 |
-
--mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
|
222 |
-
fi
|
223 |
-
```
|
224 |
-
|
225 |
-
- On the client side clone the repo, install the requirements and
|
226 |
-
execute `run_client.py`
|
227 |
-
|
228 |
-
``` bash
|
229 |
-
cd WhisperFusion
|
230 |
-
pip install -r requirements.txt
|
231 |
-
python3 run_client.py
|
232 |
```
|
233 |
|
234 |
## Contact Us
|
|
|
10 |
the [WhisperLive](https://github.com/collabora/WhisperLive) and
|
11 |
[WhisperSpeech](https://github.com/collabora/WhisperSpeech) by
|
12 |
integrating Mistral, a Large Language Model (LLM), on top of the
|
13 |
+
real-time speech-to-text pipeline. Both LLM and
|
|
|
14 |
Whisper are optimized to run efficiently as TensorRT engines, maximizing
|
15 |
+
performance and real-time processing capabilities. While WhiperSpeech is
|
16 |
+
optimized with torch.compile.
|
17 |
|
18 |
## Features
|
19 |
|
|
|
24 |
Model, to enhance the understanding and context of the transcribed
|
25 |
text.
|
26 |
|
27 |
+
- **TensorRT Optimization**: Both LLM and Whisper are optimized to
|
28 |
run as TensorRT engines, ensuring high-performance and low-latency
|
29 |
processing.
|
30 |
+
- **torch.compile**: WhisperSpeech uses torch.compile to speed up
|
31 |
+
inference which makes PyTorch code run faster by JIT-compiling PyTorch
|
32 |
+
code into optimized kernels.
|
33 |
+
|
34 |
+
## Getting Started
|
35 |
+
- We provide a pre-built TensorRT-LLM docker container that has both whisper and
|
36 |
+
phi converted to TensorRT engines and WhisperSpeech model is pre-downloaded to
|
37 |
+
quickly start interacting with WhisperFusion.
|
38 |
+
```bash
|
39 |
+
docker run --gpus all --shm-size 64G -p 6006:6006 -p 8888:8888 -it ghcr.io/collabora/whisperfusion:latest
|
40 |
+
```
|
41 |
+
|
42 |
+
- Start Web GUI
|
43 |
+
```bash
|
44 |
+
cd examples/chatbot/html
|
45 |
+
python -m http.server
|
46 |
+
```
|
47 |
+
|
48 |
+
## Build Docker Image
|
49 |
+
- We provide the docker image for cuda-architecures 89 and 90. If you have a GPU
|
50 |
+
with a different cuda architecture. For e.g. to build for RTX 3090 with cuda-
|
51 |
+
architecture 86
|
52 |
+
```bash
|
53 |
+
bash build.sh 86-real
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
```
|
55 |
|
56 |
## Contact Us
|