Ubuntu
commited on
Commit
•
2439f29
1
Parent(s):
f2f7792
add model files
Browse files- .gitattributes +1 -0
- README.md +140 -0
- llama-2-13b-chat-f16.gguf +3 -0
- llama-2-13b-chat-q2_k.gguf +3 -0
- llama-2-13b-chat-q5_k_m.gguf +3 -0
- llama-2-13b-f16.gguf +3 -0
- llama-2-13b-q2_k.gguf +3 -0
- llama-2-13b-q5_k_m.gguf +3 -0
- llama-2-7b-chat-f16.gguf +3 -0
- llama-2-7b-chat-q2_k.gguf +3 -0
- llama-2-7b-chat-q5_k_m.gguf +3 -0
- llama-2-7b-f16.gguf +3 -0
- llama-2-7b-q2_k.gguf +3 -0
- llama-2-7b-q5_k_m.gguf +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.gguf filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
license: llama2
|
5 |
+
tags:
|
6 |
+
- meta
|
7 |
+
- llama-2
|
8 |
+
- wasmedge
|
9 |
+
- second-state
|
10 |
+
- llama.cpp
|
11 |
+
model_name: Llama 2 GGUF
|
12 |
+
inference: false
|
13 |
+
model_creator: Meta Llama 2
|
14 |
+
model_type: llama
|
15 |
+
pipeline_tag: text-generation
|
16 |
+
prompt_template: '[INST] <<SYS>>
|
17 |
+
|
18 |
+
You are a helpful, respectful and honest assistant. Always answer as helpfully as
|
19 |
+
possible, while being safe. Your answers should not include any harmful, unethical,
|
20 |
+
racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses
|
21 |
+
are socially unbiased and positive in nature. If a question does not make any sense,
|
22 |
+
or is not factually coherent, explain why instead of answering something not correct.
|
23 |
+
If you don''t know the answer to a question, please don''t share false information.
|
24 |
+
|
25 |
+
<</SYS>>
|
26 |
+
|
27 |
+
{prompt}[/INST]
|
28 |
+
|
29 |
+
'
|
30 |
+
quantized_by: wasmedge
|
31 |
+
---
|
32 |
+
|
33 |
+
This repo contains GGUF model files for cross-platform AI inference using the [WasmEdge Runtime](https://github.com/WasmEdge/WasmEdge).
|
34 |
+
[Learn more](https://medium.com/stackademic/fast-and-portable-llama2-inference-on-the-heterogeneous-edge-a62508e82359) on why and how.
|
35 |
+
|
36 |
+
|
37 |
+
## Prerequisite
|
38 |
+
|
39 |
+
|
40 |
+
Install WasmEdge with the GGML plugin.
|
41 |
+
|
42 |
+
```
|
43 |
+
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
|
44 |
+
```
|
45 |
+
|
46 |
+
Download the cross-platform Wasm apps for inference.
|
47 |
+
|
48 |
+
```
|
49 |
+
curl -LO https://github.com/second-state/llama-utils/raw/main/simple/llama-simple.wasm
|
50 |
+
|
51 |
+
curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
|
52 |
+
```
|
53 |
+
|
54 |
+
## Use the f16 models
|
55 |
+
|
56 |
+
|
57 |
+
The f16 version is an GGUF equivalent of the original llama2 models. It gives the best quality inference results but also consumes the most computing resources in both VRAM and computing time. The f16 models are also great as a basis for fine-tuning.
|
58 |
+
|
59 |
+
Chat with the 7b chat model
|
60 |
+
|
61 |
+
```
|
62 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat-f16.gguf llama-chat.wasm
|
63 |
+
```
|
64 |
+
|
65 |
+
Generate text with the 7b base model
|
66 |
+
|
67 |
+
```
|
68 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
69 |
+
```
|
70 |
+
|
71 |
+
Chat with the 13b chat model
|
72 |
+
|
73 |
+
```
|
74 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-chat-f16.gguf llama-chat.wasm
|
75 |
+
```
|
76 |
+
|
77 |
+
Generate text with the 13b base model
|
78 |
+
|
79 |
+
```
|
80 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
81 |
+
```
|
82 |
+
|
83 |
+
|
84 |
+
## Use the quantized models
|
85 |
+
|
86 |
+
|
87 |
+
The `q5_k_m` version is a quantized version of the llama2 models. They are only half of the size of the original models, and hence consumes half as much VRAM, but still gives high quality inference results.
|
88 |
+
|
89 |
+
Chat with the 7b chat model
|
90 |
+
|
91 |
+
```
|
92 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
|
93 |
+
```
|
94 |
+
|
95 |
+
Generate text with the 7b base model
|
96 |
+
|
97 |
+
```
|
98 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
99 |
+
```
|
100 |
+
|
101 |
+
Chat with the 13b chat model
|
102 |
+
|
103 |
+
```
|
104 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-chat-q5_k_m.gguf llama-chat.wasm
|
105 |
+
```
|
106 |
+
|
107 |
+
Generate text with the 13b base model
|
108 |
+
|
109 |
+
```
|
110 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
111 |
+
```
|
112 |
+
|
113 |
+
## Resource constrained models
|
114 |
+
|
115 |
+
|
116 |
+
The `q2_k` version is the smallest quantized version of the llama2 models. They can run on devices with only 4GB of RAM, but the inference quality is rather low.
|
117 |
+
|
118 |
+
Chat with the 7b chat model
|
119 |
+
|
120 |
+
```
|
121 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat-q2_k.gguf llama-chat.wasm
|
122 |
+
```
|
123 |
+
|
124 |
+
Generate text with the 7b base model
|
125 |
+
|
126 |
+
```
|
127 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
128 |
+
```
|
129 |
+
|
130 |
+
Chat with the 13b chat model
|
131 |
+
|
132 |
+
```
|
133 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-chat-q2_k.gguf llama-chat.wasm
|
134 |
+
```
|
135 |
+
|
136 |
+
Generate text with the 13b base model
|
137 |
+
|
138 |
+
```
|
139 |
+
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
|
140 |
+
```
|
llama-2-13b-chat-f16.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:742b49f6a953717a511e60e480fd6c338e3acdab8b363034da247c268930b56a
|
3 |
+
size 26033303264
|
llama-2-13b-chat-q2_k.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ecfda81b21c88db496a48f76e758dd5c5289f52c85f8820896915248a11ef8a3
|
3 |
+
size 5429348096
|
llama-2-13b-chat-q5_k_m.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:69bcbe0985c12f01ebe56d837beaca6febd83a66cf3936b09e2bd1430035949d
|
3 |
+
size 9229924096
|
llama-2-13b-f16.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d021c71b94603bf4add6035912e1e78780c3b51fdafd1f0408d8438b20caff25
|
3 |
+
size 26033303264
|
llama-2-13b-q2_k.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3db4d5339cb55368e972c64e782e0ed4592fc69b651089cab1fdfeaf0a3d9398
|
3 |
+
size 5429348096
|
llama-2-13b-q5_k_m.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ffc205b67beb10be50fa588a8d0bb29d3538bdc5d202388338004c13d3c97af3
|
3 |
+
size 9229924096
|
llama-2-7b-chat-f16.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d0c47e0b78d46d59bf9ffe986072e3206aabb9f4fe66d263ccedc689a3d1c269
|
3 |
+
size 13478104576
|
llama-2-7b-chat-q2_k.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0da39ea6df19f861d46759aa1aa6e493c13cc92206047d911a00213b6eafa5cd
|
3 |
+
size 2825940544
|
llama-2-7b-chat-q5_k_m.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:34acb8c183aeb92be6fa4c63b3911ddcef6c1c52d4cdfd90a38dd7642546c1a9
|
3 |
+
size 4783156800
|
llama-2-7b-f16.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:87dc1541426f2b23f57d3af817d156b4eeb8facc34e46254aa61602ab86cc7b0
|
3 |
+
size 13478104576
|
llama-2-7b-q2_k.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:239452f5665e598afa74443935cf983674e9aad94c8a835393cd9b4ee92978cc
|
3 |
+
size 2825940544
|
llama-2-7b-q5_k_m.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c61b678f868be0ca48fa4fa5a8f37e2638e1aae2618a271c67a3c2fb7be55aac
|
3 |
+
size 4783156800
|