jpc commited on
Commit
19da603
1 Parent(s): 0733173

Prototype the new executable documentation format

Browse files
Files changed (3) hide show
  1. README.md +43 -11
  2. README.qmd +128 -0
  3. setup/setup-tensorrt-llm.sh +30 -0
README.md CHANGED
@@ -13,21 +13,53 @@ Install [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/sou
13
  Instead of building a docker image, we can also refer to the README and the [Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi) to install the required packages in the base pytroch docker image. Just make sure to use the correct base image as mentioned in the dockerfile and everything should go nice and smooth.
14
 
15
  ### Build Whisper TensorRT Engine
16
- - Change working dir to the [whisper example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper) in TensorRT-LLM.
17
- ```bash
 
 
 
 
 
 
 
 
18
  cd TensorRT-LLM/examples/whisper
19
- ```
20
- - Currently, by default TensorRT-LLM only supports `large-v2` and `large-v3`. In this repo, we use `small.en`.
21
- - Download the required assets.
22
- ```bash
23
- wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
24
 
25
- # small.en model
 
 
 
 
 
26
  wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
27
  ```
28
- - Edit `build.py` to support `small.en`. In order to do that, add `"small.en"` as an item in the list [`choices`](https://github.com/NVIDIA/TensorRT-LLM/blob/a75618df24e97ecf92b8899ca3c229c4b8097dda/examples/whisper/build.py#L58).
29
- - Build `small.en` TensorRT engine.
30
- ```bash
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  pip install -r requirements.txt
32
  python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin --model_name small.en
33
  ```
 
13
  Instead of building a docker image, we can also refer to the README and the [Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi) to install the required packages in the base pytroch docker image. Just make sure to use the correct base image as mentioned in the dockerfile and everything should go nice and smooth.
14
 
15
  ### Build Whisper TensorRT Engine
16
+
17
+ > [!NOTE]
18
+ >
19
+ > These steps are included in `setup/setup-tensorrt-llm.sh`
20
+
21
+ Change working dir to the [whisper example
22
+ dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper)
23
+ in TensorRT-LLM.
24
+
25
+ ``` bash
26
  cd TensorRT-LLM/examples/whisper
27
+ ```
28
+
29
+ Currently, by default TensorRT-LLM only supports `large-v2` and
30
+ `large-v3`. In this repo, we use `small.en`.
 
31
 
32
+ Download the required assets
33
+
34
+ ``` bash
35
+ # the sound filter definitions
36
+ wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
37
+ # the small.en model weights
38
  wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
39
  ```
40
+
41
+ We have to patch the script to add support for out model size
42
+ (`small.en`):
43
+
44
+ ``` bash
45
+ patch <<EOF
46
+ --- build.py.old 2024-01-17 17:47:47.508545842 +0100
47
+ +++ build.py 2024-01-17 17:47:41.404941926 +0100
48
+ @@ -58,6 +58,7 @@
49
+ choices=[
50
+ "large-v3",
51
+ "large-v2",
52
+ + "small.en",
53
+ ])
54
+ parser.add_argument('--quantize_dir', type=str, default="quantize/1-gpu")
55
+ parser.add_argument('--dtype',
56
+ EOF
57
+ ```
58
+
59
+ Finally we can build the TensorRT engine for the `small.en` Whisper
60
+ model:
61
+
62
+ ``` bash
63
  pip install -r requirements.txt
64
  python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin --model_name small.en
65
  ```
README.qmd ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ format: gfm
3
+ execute:
4
+ echo: false
5
+ output: asis
6
+ ---
7
+
8
+ ```{python}
9
+ #| include: false
10
+ def include_file(fname):
11
+ with open(fname) as f:
12
+ print(f'''
13
+ :::{{.callout-note}}
14
+ These steps are included in `{fname}`
15
+ :::
16
+ ''')
17
+ code = False
18
+ for l in f:
19
+ if l.startswith('#!'):
20
+ continue
21
+ if l.startswith('## '):
22
+ if code: print("```"); code=False
23
+ print(l[3:])
24
+ elif l.strip():
25
+ if not code: print("```bash"); code=True
26
+ print(l.rstrip())
27
+ if code: print("```")
28
+ ```
29
+
30
+ # WhisperBot
31
+
32
+ Welcome to WhisperBot. WhisperBot builds upon the capabilities of the [WhisperLive]() by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper, a powerful automatic speech recognition (ASR) system. Both Mistral and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities.
33
+
34
+ ## Features
35
+ - **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert spoken language into text in real-time.
36
+
37
+ - **Large Language Model Integration**: Adds Mistral, a Large Language Model, to enhance the understanding and context of the transcribed text.
38
+
39
+ - **TensorRT Optimization**: Both Mistral and Whisper are optimized to run as TensorRT engines, ensuring high-performance and low-latency processing.
40
+
41
+ ## Prerequisites
42
+ Install [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation.md) to build Whisper and Mistral TensorRT engines. The README builds a docker image for TensorRT-LLM.
43
+ Instead of building a docker image, we can also refer to the README and the [Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi) to install the required packages in the base pytroch docker image. Just make sure to use the correct base image as mentioned in the dockerfile and everything should go nice and smooth.
44
+
45
+ ### Build Whisper TensorRT Engine
46
+
47
+ ```{python}
48
+ include_file('setup/setup-tensorrt-llm.sh')
49
+ ```
50
+
51
+ ### Build Mistral TensorRT Engine
52
+ - Change working dir to [llama example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama) in TensorRT-LLM folder.
53
+ ```bash
54
+ cd TensorRT-LLM/examples/llama
55
+ ```
56
+ - Convert Mistral to `fp16` TensorRT engine.
57
+ ```bash
58
+ python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
59
+ --dtype float16 \
60
+ --remove_input_padding \
61
+ --use_gpt_attention_plugin float16 \
62
+ --enable_context_fmha \
63
+ --use_gemm_plugin float16 \
64
+ --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
65
+ --max_input_len 5000
66
+ --max_batch_size 1
67
+ ```
68
+
69
+ ### Build Phi TensorRT Engine
70
+ Note: Phi is only available in main branch and hasnt been released yet. So, make sure to build TensorRT-LLM from main branch.
71
+ - Change working dir to [phi example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/phi) in TensorRT-LLM folder.
72
+ ```bash
73
+ cd TensorRT-LLM/examples/phi
74
+ ```
75
+ - Build phi TensorRT engine
76
+ ```bash
77
+ git lfs install
78
+ git clone https://huggingface.co/microsoft/phi-2
79
+ python3 build.py --dtype=float16 \
80
+ --log_level=verbose \
81
+ --use_gpt_attention_plugin float16 \
82
+ --use_gemm_plugin float16 \
83
+ --max_batch_size=16 \
84
+ --max_input_len=1024 \
85
+ --max_output_len=1024 \
86
+ --output_dir=phi_engine \
87
+ --model_dir=phi-2>&1 | tee build.log
88
+ ```
89
+
90
+ ## Run WhisperBot
91
+ - Clone this repo and install requirements.
92
+ ```bash
93
+ git clone https://github.com/collabora/WhisperBot.git
94
+ cd WhisperBot
95
+ apt update
96
+ apt install ffmpeg portaudio19-dev -y
97
+ pip install -r requirements.txt
98
+ ```
99
+
100
+ ### Whisper + Mistral
101
+ - Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Mistral TensorRT from the build phase. If a huggingface model is used to build mistral then just use the huggingface repo name as the tokenizer path.
102
+ ```bash
103
+ python3 main.py --mistral
104
+ --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
105
+ --mistral_tensorrt_path /root/TensorRT-LLM/examples/llama/tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
106
+ --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
107
+ ```
108
+
109
+ ### Whisper + Phi
110
+ - Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Phi TensorRT from the build phase. If a huggingface model is used to build phi then just use the huggingface repo name as the tokenizer path.
111
+ ```bash
112
+ python3 main.py --phi
113
+ --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
114
+ --phi_tensorrt_path /root/TensorRT-LLM/examples/phi/phi_engine \
115
+ --phi_tokenizer_path /root/TensorRT-LLM/examples/phi/phi-2
116
+ ```
117
+
118
+ - On the client side clone the repo, install the requirements and execute `run_client.py`
119
+ ```bash
120
+ cd WhisperBot
121
+ pip install -r requirements.txt
122
+ python3 run_client.py
123
+ ```
124
+
125
+
126
+ ## Contact Us
127
+ For questions or issues, please open an issue.
128
+ Contact us at: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com
setup/setup-tensorrt-llm.sh ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ ## Change working dir to the [whisper example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper) in TensorRT-LLM.
4
+ cd TensorRT-LLM/examples/whisper
5
+
6
+ ## Currently, by default TensorRT-LLM only supports `large-v2` and `large-v3`. In this repo, we use `small.en`.
7
+ ## Download the required assets
8
+
9
+ # the sound filter definitions
10
+ wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
11
+ # the small.en model weights
12
+ wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
13
+
14
+ ## We have to patch the script to add support for out model size (`small.en`):
15
+ patch <<EOF
16
+ --- build.py.old 2024-01-17 17:47:47.508545842 +0100
17
+ +++ build.py 2024-01-17 17:47:41.404941926 +0100
18
+ @@ -58,6 +58,7 @@
19
+ choices=[
20
+ "large-v3",
21
+ "large-v2",
22
+ + "small.en",
23
+ ])
24
+ parser.add_argument('--quantize_dir', type=str, default="quantize/1-gpu")
25
+ parser.add_argument('--dtype',
26
+ EOF
27
+
28
+ ## Finally we can build the TensorRT engine for the `small.en` Whisper model:
29
+ pip install -r requirements.txt
30
+ python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin --model_name small.en