Ubuntu commited on
Commit
2439f29
1 Parent(s): f2f7792

add model files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: llama2
5
+ tags:
6
+ - meta
7
+ - llama-2
8
+ - wasmedge
9
+ - second-state
10
+ - llama.cpp
11
+ model_name: Llama 2 GGUF
12
+ inference: false
13
+ model_creator: Meta Llama 2
14
+ model_type: llama
15
+ pipeline_tag: text-generation
16
+ prompt_template: '[INST] <<SYS>>
17
+
18
+ You are a helpful, respectful and honest assistant. Always answer as helpfully as
19
+ possible, while being safe. Your answers should not include any harmful, unethical,
20
+ racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses
21
+ are socially unbiased and positive in nature. If a question does not make any sense,
22
+ or is not factually coherent, explain why instead of answering something not correct.
23
+ If you don''t know the answer to a question, please don''t share false information.
24
+
25
+ <</SYS>>
26
+
27
+ {prompt}[/INST]
28
+
29
+ '
30
+ quantized_by: wasmedge
31
+ ---
32
+
33
+ This repo contains GGUF model files for cross-platform AI inference using the [WasmEdge Runtime](https://github.com/WasmEdge/WasmEdge).
34
+ [Learn more](https://medium.com/stackademic/fast-and-portable-llama2-inference-on-the-heterogeneous-edge-a62508e82359) on why and how.
35
+
36
+
37
+ ## Prerequisite
38
+
39
+
40
+ Install WasmEdge with the GGML plugin.
41
+
42
+ ```
43
+ curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml
44
+ ```
45
+
46
+ Download the cross-platform Wasm apps for inference.
47
+
48
+ ```
49
+ curl -LO https://github.com/second-state/llama-utils/raw/main/simple/llama-simple.wasm
50
+
51
+ curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
52
+ ```
53
+
54
+ ## Use the f16 models
55
+
56
+
57
+ The f16 version is an GGUF equivalent of the original llama2 models. It gives the best quality inference results but also consumes the most computing resources in both VRAM and computing time. The f16 models are also great as a basis for fine-tuning.
58
+
59
+ Chat with the 7b chat model
60
+
61
+ ```
62
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat-f16.gguf llama-chat.wasm
63
+ ```
64
+
65
+ Generate text with the 7b base model
66
+
67
+ ```
68
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
69
+ ```
70
+
71
+ Chat with the 13b chat model
72
+
73
+ ```
74
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-chat-f16.gguf llama-chat.wasm
75
+ ```
76
+
77
+ Generate text with the 13b base model
78
+
79
+ ```
80
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
81
+ ```
82
+
83
+
84
+ ## Use the quantized models
85
+
86
+
87
+ The `q5_k_m` version is a quantized version of the llama2 models. They are only half of the size of the original models, and hence consumes half as much VRAM, but still gives high quality inference results.
88
+
89
+ Chat with the 7b chat model
90
+
91
+ ```
92
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
93
+ ```
94
+
95
+ Generate text with the 7b base model
96
+
97
+ ```
98
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
99
+ ```
100
+
101
+ Chat with the 13b chat model
102
+
103
+ ```
104
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-chat-q5_k_m.gguf llama-chat.wasm
105
+ ```
106
+
107
+ Generate text with the 13b base model
108
+
109
+ ```
110
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
111
+ ```
112
+
113
+ ## Resource constrained models
114
+
115
+
116
+ The `q2_k` version is the smallest quantized version of the llama2 models. They can run on devices with only 4GB of RAM, but the inference quality is rather low.
117
+
118
+ Chat with the 7b chat model
119
+
120
+ ```
121
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat-q2_k.gguf llama-chat.wasm
122
+ ```
123
+
124
+ Generate text with the 7b base model
125
+
126
+ ```
127
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
128
+ ```
129
+
130
+ Chat with the 13b chat model
131
+
132
+ ```
133
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-chat-q2_k.gguf llama-chat.wasm
134
+ ```
135
+
136
+ Generate text with the 13b base model
137
+
138
+ ```
139
+ wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
140
+ ```
llama-2-13b-chat-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:742b49f6a953717a511e60e480fd6c338e3acdab8b363034da247c268930b56a
3
+ size 26033303264
llama-2-13b-chat-q2_k.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecfda81b21c88db496a48f76e758dd5c5289f52c85f8820896915248a11ef8a3
3
+ size 5429348096
llama-2-13b-chat-q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69bcbe0985c12f01ebe56d837beaca6febd83a66cf3936b09e2bd1430035949d
3
+ size 9229924096
llama-2-13b-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d021c71b94603bf4add6035912e1e78780c3b51fdafd1f0408d8438b20caff25
3
+ size 26033303264
llama-2-13b-q2_k.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3db4d5339cb55368e972c64e782e0ed4592fc69b651089cab1fdfeaf0a3d9398
3
+ size 5429348096
llama-2-13b-q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ffc205b67beb10be50fa588a8d0bb29d3538bdc5d202388338004c13d3c97af3
3
+ size 9229924096
llama-2-7b-chat-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0c47e0b78d46d59bf9ffe986072e3206aabb9f4fe66d263ccedc689a3d1c269
3
+ size 13478104576
llama-2-7b-chat-q2_k.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0da39ea6df19f861d46759aa1aa6e493c13cc92206047d911a00213b6eafa5cd
3
+ size 2825940544
llama-2-7b-chat-q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34acb8c183aeb92be6fa4c63b3911ddcef6c1c52d4cdfd90a38dd7642546c1a9
3
+ size 4783156800
llama-2-7b-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87dc1541426f2b23f57d3af817d156b4eeb8facc34e46254aa61602ab86cc7b0
3
+ size 13478104576
llama-2-7b-q2_k.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:239452f5665e598afa74443935cf983674e9aad94c8a835393cd9b4ee92978cc
3
+ size 2825940544
llama-2-7b-q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c61b678f868be0ca48fa4fa5a8f37e2638e1aae2618a271c67a3c2fb7be55aac
3
+ size 4783156800