happyme531 commited on
Commit
a853f7e
1 Parent(s): 2ef3e1d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +209 -3
README.md CHANGED
@@ -1,3 +1,209 @@
1
- ---
2
- license: agpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2-Audio-7B-Instruct
4
+ tags:
5
+ - rknn
6
+ - rkllm
7
+ license: agpl-3.0
8
+ ---
9
+
10
+ # Qwen2-Audio-7B-Instruct-rkllm
11
+
12
+ ## (English README see below)
13
+
14
+ 在RK3588上运行强大的Qwen2-Audio-7B-Instruct音频大模型!
15
+
16
+ - 推理速度(RK3588, 输入10秒音频): 音频编码器 12.2s(单核NPU) + LLM 填充 4.4s (282 tokens / 64.7 tps) + 解码 3.69 tps
17
+ - 内存占用(RK3588, 上下文长度768): 11.6GB
18
+
19
+ ## 使用方法
20
+
21
+ 1. 克隆或者下载此仓库到本地. 模型较大, 请确保有足够的磁盘空间.
22
+
23
+ 2. 开发板的RKNPU2内核驱动版本必须>=0.9.6才能运行这么大的模型.
24
+ 使用root权限运行以下命令检查驱动版本:
25
+ ```bash
26
+ > cat /sys/kernel/debug/rknpu/version
27
+ RKNPU driver: v0.9.8
28
+ ```
29
+ 如果版本过低, 请更新驱动. 你可能需要更新内核, 或查找官方文档以获取帮助.
30
+
31
+ 3. 安装依赖
32
+
33
+ ```bash
34
+ pip install numpy<2 opencv-python rknn-toolkit-lite2 librosa transformers
35
+ ```
36
+
37
+ 4. 运行
38
+
39
+ ```bash
40
+ python multiprocess_inference.py
41
+ ```
42
+
43
+ 如果实测性能不理想, 可以调整CPU调度器让CPU始终运行在最高频率, 并把推理程序绑定到大核(`taskset -c 4-7 python multiprocess_inference.py`)
44
+
45
+ 如果出现llvm相关的错误报错, 请更新llvmlite库: `pip install --upgrade llvmlite`
46
+
47
+ >```
48
+ >W rknn-toolkit-lite2 version: 2.3.0
49
+ >Start loading audio encoder model (size: 1300.25 MB)
50
+ >Start loading language model (size: 8037.93 MB)
51
+ >I rkllm: rkllm-runtime version: 1.1.2, rknpu driver version: 0.9.8, platform: RK3588
52
+ >
53
+ >Audio encoder loaded in 13.65 seconds
54
+ >I RKNN: [20:30:05.616] RKNN Runtime Information, librknnrt version: 2.3.0 (c949ad889d@2024-11-07T11:35:33)
55
+ >I RKNN: [20:30:05.616] RKNN Driver Information, version: 0.9.8
56
+ >I RKNN: [20:30:05.617] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (c949ad889d@2024-11-07T11:39:30)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
57
+ >W RKNN: [20:30:07.950] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
58
+ >W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
59
+ >Received ready signal: audio_ready
60
+ >Language model loaded in 9.94 seconds
61
+ >Received ready signal: llm_ready
62
+ >All models loaded, starting interactive mode...
63
+ >
64
+ >Enter your input (3 empty lines to start inference, Ctrl+C to exit, for example:
65
+ >这是什么声音{{./jntm.mp3}}?
66
+ >What kind of sound is in {{./test.mp3}}?
67
+ >Describe the audio in {{./jntm.mp3}}
68
+ >这是什么动物的叫声{{./jntm.mp3}}?
69
+ >):
70
+ >
71
+ >这是什么声音{{./jntm.mp3}}??????
72
+ >
73
+ >
74
+ >Start audio inference...
75
+ >Received prompt: ====<|im_start|>system
76
+ >You are a helpful assistant.<|im_end|>
77
+ > <|im_start|>user
78
+ > Audio 1: <image>
79
+ > 这是什么声音??????<|im_end|>
80
+ > <|im_start|>assistant
81
+ >
82
+ >====
83
+ > /home/firefly/mnt/zt-back/Qwen2-7B-audiow/./multiprocess_inference.py:43: UserWarning: PySoundFile failed. Trying audioread instead.
84
+ > audio, _ = librosa.load(audio_path, sr=feature_extractor.sampling_rate)
85
+ > /home/firefly/.local/lib/python3.9/site-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load
86
+ > Deprecated as of librosa version 0.10.0.
87
+ > It will be removed in librosa version 1.0.
88
+ > y, sr_native = __audioread_load(path, offset, duration, dtype)
89
+ >Audio encoder inference time: 12.22 seconds
90
+ >(1, 251, 4096)
91
+ >(1, 251, 4096)
92
+ >Start LLM inference...
93
+ >🎉 完成!
94
+ >
95
+ >Time to first token: 4.28 seconds
96
+ >语音中是一段音乐,包含唱歌和乐器演奏。背景音乐里有鼓声、贝斯、钢琴和小号的演奏,同时背景能够听到胃里咕咕作响和吃东西的声音。这首歌可能是用于广告。
97
+ >
98
+ >(finished)
99
+ >
100
+ >--------------------------------------------------------------------------------------
101
+ > Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
102
+ >--------------------------------------------------------------------------------------
103
+ > Prefill 4269.62 283 15.09 66.28
104
+ > Generate 13279.37 49 272.13 3.67
105
+ >--------------------------------------------------------------------------------------
106
+ >
107
+ >```
108
+
109
+ ## 模型转换
110
+
111
+ #### 准备工作
112
+
113
+ 1. 安装rknn-toolkit2 v2.3.0或更高版本, 以及rkllm-toolkit v1.1.2或更高版本.
114
+ 2. 下载此仓库到本地, 但不需要下载`.rkllm`和`.rknn`结尾的模型文件.
115
+ 3. 下载Qwen2-Audio-7B-Instruct的huggingface模型仓库到本地. (https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct)
116
+
117
+ #### 转换LLM
118
+
119
+ 1. 将此仓库中的`rename_tensors.py`文件复制到Qwen2-Audio-7B-Instruct的huggingface模型仓库根目录并运行. 稍等片刻, 会生成`model-renamed-00001-of-00004.safetensors`等4���safetensors文件和一个json文件.
120
+ 2. 不用管那个json文件, 将那4个safetensors文件移动到此仓库根目录下.
121
+ 3. 执行`rkllm-convert.py`. 等一会, 会生成`qwen.rkllm`, 就是转换后的模型.
122
+
123
+ #### 转换音频编码器
124
+
125
+ 1. 打开`audio_encoder_export_onnx.py`, 修改文件最下方模型路径为Qwen2-Audio-7B-Instruct模型文件夹的路径. 然后执行. 等一会, 会生成`audio_encoder.onnx`和很多权重文件.
126
+ 2. 执行`audio_encoder_convert_rknn.py all`. 等一会, 会生成`audio_encoder.rknn`, 这就是转换后的音频编码器.
127
+
128
+ ## 已知问题
129
+
130
+ - 由于疑似RKLLM中存在的问题, 如果音频编码器和LLM加载进同一个Python进程, 可能会导致LLM推理时报错段错误. 可以使用多进程来解决. 参考`multiprocess_inference.py`.
131
+ - 由于RKLLM中存在的问题, 输入序列较长时LLM推理会段错误. https://github.com/airockchip/rknn-llm/issues/123
132
+ - 由于RKLLM的多模态输入的限制, 在整个对话中只能加载一段音频. 可以通过Embedding输入的方式来解决, 但我没有实现.
133
+ - 没有实现多轮对话.
134
+ - RKLLM的w8a8量化貌似存在不小的精度损失, 并且这个模型的量化校准数据使用了RKLLM自带的wikitext数据集, 可能会导致精度明显下降.
135
+
136
+ ## 参考
137
+
138
+ - [Qwen/Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct)
139
+ - [Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B)
140
+
141
+
142
+ ## English README
143
+
144
+ # Qwen2-Audio-7B-Instruct-rkllm
145
+
146
+ Run the powerful Qwen2-Audio-7B-Instruct audio model on RK3588!
147
+
148
+ - Inference speed (RK3588, 10s audio input): Audio encoder 12.2s (single NPU core) + LLM prefill 4.4s (282 tokens / 64.7 tps) + decoding 3.69 tps
149
+ - Memory usage (RK3588, context length 768): 11.6GB
150
+
151
+ ## Usage
152
+
153
+ 1. Clone or download this repository. The model is large, please ensure sufficient disk space.
154
+
155
+ 2. The RKNPU2 kernel driver version on your development board must be >=0.9.6 to run such a large model.
156
+ Check the driver version with root privilege:
157
+ ```bash
158
+ > cat /sys/kernel/debug/rknpu/version
159
+ RKNPU driver: v0.9.8
160
+ ```
161
+ If the version is too low, please update the driver. You may need to update the kernel or check official documentation for help.
162
+
163
+ 3. Install dependencies
164
+
165
+ ```bash
166
+ pip install numpy<2 opencv-python rknn-toolkit-lite2 librosa transformers
167
+ ```
168
+
169
+ 4. Run
170
+
171
+ ```bash
172
+ python multiprocess_inference.py
173
+ ```
174
+
175
+ If the actual performance is not ideal, you can adjust the CPU scheduler to make the CPU run at the highest frequency and bind the inference program to big cores (`taskset -c 4-7 python multiprocess_inference.py`)
176
+
177
+ If you encounter llvm-related errors, please update the llvmlite library: `pip install --upgrade llvmlite`
178
+
179
+ ## Model Conversion
180
+
181
+ #### Preparation
182
+
183
+ 1. Install rknn-toolkit2 v2.3.0 or higher, and rkllm-toolkit v1.1.2 or higher.
184
+ 2. Download this repository locally, but you don't need to download the model files ending with `.rkllm` and `.rknn`.
185
+ 3. Download the Qwen2-Audio-7B-Instruct huggingface model repository locally. (https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct)
186
+
187
+ #### Converting LLM
188
+
189
+ 1. Copy the `rename_tensors.py` file from this repository to the root directory of the Qwen2-Audio-7B-Instruct huggingface model repository and run it. Wait a moment, it will generate 4 safetensors files like `model-renamed-00001-of-00004.safetensors` and a json file.
190
+ 2. Ignore the json file, move those 4 safetensors files to the root directory of this repository.
191
+ 3. Execute `rkllm-convert.py`. Wait a while, it will generate `qwen.rkllm`, which is the converted model.
192
+
193
+ #### Converting Audio Encoder
194
+
195
+ 1. Open `audio_encoder_export_onnx.py`, modify the model path at the bottom of the file to the path of your Qwen2-Audio-7B-Instruct model folder. Then execute it. Wait a while, it will generate `audio_encoder.onnx` and many weight files.
196
+ 2. Execute `audio_encoder_convert_rknn.py all`. Wait a while, it will generate `audio_encoder.rknn`, which is the converted audio encoder.
197
+
198
+ ## Known Issues
199
+
200
+ - Due to a suspected issue in RKLLM, if the audio encoder and LLM are loaded into the same Python process, it may cause segmentation fault during LLM inference. This can be solved using multiprocessing. Refer to `multiprocess_inference.py`.
201
+ - Due to an issue in RKLLM, LLM inference will segfault with long input sequences. See https://github.com/airockchip/rknn-llm/issues/123
202
+ - Due to RKLLM's multimodal input limitations, only one audio clip can be loaded in the entire conversation. This could be solved using Embedding input, but I haven't implemented it.
203
+ - Multi-turn dialogue is not implemented.
204
+ - RKLLM's w8a8 quantization seems to have significant precision loss, and this model's quantization calibration data uses RKLLM's built-in wikitext dataset, which may lead to noticeable accuracy degradation.
205
+
206
+ ## References
207
+
208
+ - [Qwen/Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct)
209
+ - [Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B)