|
--- |
|
base_model: |
|
- Qwen/Qwen2.5-7B |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- rknn |
|
- rkllm |
|
- chat |
|
- rk3588 |
|
--- |
|
## 3ib0n's RKLLM Guide |
|
These models and binaries require an RK3588 board running rknpu driver version 0.9.7 or above |
|
|
|
## Steps to reproduce conversion |
|
```shell |
|
# Download and setup miniforge3 |
|
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" |
|
bash Miniforge3-$(uname)-$(uname -m).sh |
|
|
|
# activate the base environment |
|
source ~/miniforge3/bin/activate |
|
|
|
# create and activate a python 3.8 environment |
|
conda create -n rknn-llm-1.1.4 python=3.8 |
|
conda activate rknn-llm-1.1.4 |
|
|
|
# clone the lastest rknn-llm toolkit |
|
git clone https://github.com/airockchip/rknn-llm.git |
|
|
|
# intstall dependencies for the toolkit |
|
pip install transformers accelerate torchvision rknn-toolkit2==2.2.1 |
|
pip install --upgrade torch pillow |
|
|
|
# install rkllm |
|
pip install ../../rkllm-toolkit/packages/rkllm_toolkit-1.1.4-cp38-cp38-linux_x86_64.whl |
|
|
|
# edit or create a script to export rkllm models |
|
cd rknn-llm/examples/rkllm_multimodal_demo |
|
nano export/export_rkllm.py # update input and output paths |
|
python export/export_rkllm.py |
|
``` |
|
|
|
Example export_rkllm.py modified from https://github.com/airockchip/rknn-llm/blob/main/examples/rkllm_multimodel_demo/export/export_rkllm.py |
|
```python |
|
import os |
|
from rkllm.api import RKLLM |
|
from datasets import load_dataset |
|
from transformers import AutoTokenizer |
|
from tqdm import tqdm |
|
import torch |
|
from torch import nn |
|
|
|
modelpath = "~/models/Qwen/Qwen2.5-7B-Instruct/" ## UPDATE HERE |
|
savepath = './Qwen2.5-7B-Instruct.rkllm' ## UPDATE HERE |
|
llm = RKLLM() |
|
|
|
# Load model |
|
# Use 'export CUDA_VISIBLE_DEVICES=2' to specify GPU device |
|
ret = llm.load_huggingface(model=modelpath, device='cpu') |
|
if ret != 0: |
|
print('Load model failed!') |
|
exit(ret) |
|
|
|
# Build model |
|
qparams = None |
|
|
|
## Do not use the dataset parameter as we are converting a pure text model, not a multimodal |
|
ret = llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w8a8', |
|
quantized_algorithm='normal', target_platform='rk3588', num_npu_core=3, extra_qparams=qparams) |
|
|
|
if ret != 0: |
|
print('Build model failed!') |
|
exit(ret) |
|
|
|
# # Export rkllm model |
|
ret = llm.export_rkllm(savepath) |
|
if ret != 0: |
|
print('Export model failed!') |
|
exit(ret) |
|
``` |
|
|
|
## Steps to build and run demo |
|
|
|
```shell |
|
# Dwonload the correct toolchain for working with rkllm |
|
# Documentation here: https://github.com/airockchip/rknn-llm/blob/main/doc/Rockchip_RKLLM_SDK_EN_1.1.0.pdf |
|
wget https://developer.arm.com/-/media/Files/downloads/gnu-a/10.2-2020.11/binrel/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz |
|
tar -xz gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz |
|
|
|
# ensure that the gcc compiler path is set to the location where the toolchain dowloaded earlier is unpacked |
|
nano deploy/build-linux.sh # update the gcc compiler path |
|
|
|
# compile the demo app |
|
cd delpoy/ |
|
./build-linux.sh |
|
``` |
|
|
|
## Steps to run the app |
|
More information and original guide: https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_multimodel_demo |
|
```shell |
|
# push install dir to device |
|
adb push ./install/demo_Linux_aarch64 /data |
|
# push model file to device |
|
adb push Qwen2.5-7B-Instruct.rkllm /data/models |
|
|
|
adb shell |
|
cd /data/demo_Linux_aarch64 |
|
# export lib path |
|
export LD_LIBRARY_PATH=./lib |
|
# soft link models dir |
|
ln -s /data/models . |
|
# run llm(Pure Text Example) |
|
./llm models/Qwen2.5-7B-Instruct.rkllm 128 512 |
|
``` |