Pelochus commited on
Commit
9d4dfc6
1 Parent(s): 71f0fc0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -4
README.md CHANGED
@@ -15,14 +15,18 @@ This repo contains the converted models for running on the RK3588 NPU found in S
15
  Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu
16
 
17
  ## Available LLMs
 
 
18
  Right now, only converted the following models:
19
  | LLM | Parameters | Link |
20
  | --------------------- | ----------- | ---------------------------------------------------------- |
21
  | Qwen Chat | 1.8B | https://huggingface.co/Pelochus/qwen-1_8B-rk3588 |
22
  | Microsoft Phi-2 | 2.7B | https://huggingface.co/Pelochus/phi-2-rk3588 |
23
- | TinyLlama v1 | 1.1B | https://huggingface.co/Pelochus/tinyllama-v1-rk3588 |
 
 
24
 
25
- However, RKLLM also supports Qwen 2 and Llama 2 7B, but I can't convert them due to my PC only having 16 GBs of RAM.
26
  For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max).
27
 
28
  ## Downloading a model
@@ -38,7 +42,7 @@ If the first clone gives you problems (takes too long) you can also:
38
 
39
  `GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE`
40
 
41
- Then your model will be inside the downloaded folder.
42
 
43
  ## RKLLM parameters used
44
  RK3588 **only supports w8a8 quantization**, so that was the selected quantization for ALL models.
@@ -46,7 +50,8 @@ Aside from that, RKLLM toolkit allows for no optimization (0) and optimization (
46
  All models are optimized.
47
 
48
  ## Future additions
49
- - [ ] Converting Qwen 2 and Llama 2
 
50
  - [ ] Adding other compatible Rockchip's SoCs
51
 
52
  ## More info
 
15
  Check the main repo on GitHub for how to install and use: https://github.com/Pelochus/ezrknpu
16
 
17
  ## Available LLMs
18
+ Before running any LLM, take into account that the required RAM is between 1.5-3 times the model size (this is an estimation, haven't done extensive testing yet).
19
+
20
  Right now, only converted the following models:
21
  | LLM | Parameters | Link |
22
  | --------------------- | ----------- | ---------------------------------------------------------- |
23
  | Qwen Chat | 1.8B | https://huggingface.co/Pelochus/qwen-1_8B-rk3588 |
24
  | Microsoft Phi-2 | 2.7B | https://huggingface.co/Pelochus/phi-2-rk3588 |
25
+ | Llama 2 7B | 7B | https://huggingface.co/Pelochus/llama2-chat-7b-hf-rk3588 |
26
+ | Llama 2 13B | 13B | https://huggingface.co/Pelochus/llama2-chat-13b-hf-rk3588 |
27
+ | TinyLlama v1 (broken) | 1.1B | https://huggingface.co/Pelochus/tinyllama-v1-rk3588 |
28
 
29
+ However, RKLLM also supports Qwen 1.5. Llama 2 was converted using Azure servers.
30
  For reference, converting Phi-2 peaked at about 15 GBs of RAM + 25 GBs of swap (counting OS, but that was using about 2 GBs max).
31
 
32
  ## Downloading a model
 
42
 
43
  `GIT_LFS_SKIP_SMUDGE=1 git clone LINK_FROM_PREVIOUS_TABLE_HERE`
44
 
45
+ And then 'git lfs pull' inside the cloned folder to download the full model.
46
 
47
  ## RKLLM parameters used
48
  RK3588 **only supports w8a8 quantization**, so that was the selected quantization for ALL models.
 
50
  All models are optimized.
51
 
52
  ## Future additions
53
+ - [x] Converting Llama 2 (70B currently in conversion, but that won't run even with 32GB RAM)
54
+ - [ ] Converting Qwen 1.5 (from 0.5 to 7B)
55
  - [ ] Adding other compatible Rockchip's SoCs
56
 
57
  ## More info