Pelochus
/

ezrkllm-collection

Text Generation

rockchip

rk3588

rkllm

text-generation-inference

Model card Files Files and versions Community

Pelochus commited on Apr 14, 2024

Commit

9d4dfc6

verified ·

1 Parent(s): 71f0fc0

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -4

README.md CHANGED Viewed

@@ -15,14 +15,18 @@ This repo contains the converted models for running on the RK3588 NPU found in S
 Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu
 ## Available LLMs
 Right now, only converted the following models:
 | LLM                   | Parameters  | Link                                                       |
 | --------------------- | ----------- | ---------------------------------------------------------- |
 | Qwen Chat             | 1.8B        | https://huggingface.co/Pelochus/qwen-1_8B-rk3588           |
 | Microsoft Phi-2       | 2.7B        | https://huggingface.co/Pelochus/phi-2-rk3588               |
-| TinyLlama v1          | 1.1B        | https://huggingface.co/Pelochus/tinyllama-v1-rk3588        |
-However, RKLLM also supports Qwen 2 and Llama 2 7B, but I can't convert them due to my PC only having 16 GBs of RAM.
 For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max).
 ## Downloading a model
@@ -38,7 +42,7 @@ If the first clone gives you problems (takes too long) you can also:
 `GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE`
-Then your model will be inside the downloaded folder.
 ## RKLLM parameters used
 RK3588 **only supports w8a8 quantization**, so that was the selected quantization for ALL models.
@@ -46,7 +50,8 @@ Aside from that, RKLLM toolkit allows for no optimization (0) and optimization (
 All models are optimized.
 ## Future additions
-- [ ] Converting Qwen 2 and Llama 2
 - [ ] Adding other compatible Rockchip's SoCs
 ## More info

 Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu
 ## Available LLMs
+Before running any LLM, take into account that the required RAM is between 1.5-3 times the model size (this is an estimation, haven't done extensive testing yet).
 Right now, only converted the following models:
 | LLM                   | Parameters  | Link                                                       |
 | --------------------- | ----------- | ---------------------------------------------------------- |
 | Qwen Chat             | 1.8B        | https://huggingface.co/Pelochus/qwen-1_8B-rk3588           |
 | Microsoft Phi-2       | 2.7B        | https://huggingface.co/Pelochus/phi-2-rk3588               |
+| Llama 2 7B            | 7B          | https://huggingface.co/Pelochus/llama2-chat-7b-hf-rk3588   |
+| Llama 2 13B           | 13B         | https://huggingface.co/Pelochus/llama2-chat-13b-hf-rk3588  |
+| TinyLlama v1 (broken) | 1.1B        | https://huggingface.co/Pelochus/tinyllama-v1-rk3588        |
+However, RKLLM also supports Qwen 1.5. Llama 2 was converted using Azure servers.
 For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max).
 ## Downloading a model
 `GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE`
+And then 'git lfs pull' inside the cloned folder to download the full model.
 ## RKLLM parameters used
 RK3588 **only supports w8a8 quantization**, so that was the selected quantization for ALL models.
 All models are optimized.
 ## Future additions
+- [x] Converting Llama 2 (70B currently in conversion, but that won't run even with 32GB RAM)
+- [ ] Converting Qwen 1.5 (from 0.5 to 7B)
 - [ ] Adding other compatible Rockchip's SoCs
 ## More info