juntaoyuan commited on
Commit
23de599
1 Parent(s): 2439f29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -17
README.md CHANGED
@@ -51,63 +51,62 @@ curl -LO https://github.com/second-state/llama-utils/raw/main/simple/llama-simpl
51
  curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
52
  ```
53
 
54
- ## Use the f16 models
55
 
56
 
57
- The f16 version is an GGUF equivalent of the original llama2 models. It gives the best quality inference results but also consumes the most computing resources in both VRAM and computing time. The f16 models are also great as a basis for fine-tuning.
58
 
59
  Chat with the 7b chat model
60
 
61
  ```
62
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat-f16.gguf llama-chat.wasm
63
  ```
64
 
65
  Generate text with the 7b base model
66
 
67
  ```
68
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
69
  ```
70
 
71
  Chat with the 13b chat model
72
 
73
  ```
74
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-chat-f16.gguf llama-chat.wasm
75
  ```
76
 
77
  Generate text with the 13b base model
78
 
79
  ```
80
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
81
  ```
82
 
83
-
84
- ## Use the quantized models
85
 
86
 
87
- The `q5_k_m` version is a quantized version of the llama2 models. They are only half of the size of the original models, and hence consumes half as much VRAM, but still gives high quality inference results.
88
 
89
  Chat with the 7b chat model
90
 
91
  ```
92
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
93
  ```
94
 
95
  Generate text with the 7b base model
96
 
97
  ```
98
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
99
  ```
100
 
101
  Chat with the 13b chat model
102
 
103
  ```
104
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-chat-q5_k_m.gguf llama-chat.wasm
105
  ```
106
 
107
  Generate text with the 13b base model
108
 
109
  ```
110
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
111
  ```
112
 
113
  ## Resource constrained models
@@ -118,23 +117,23 @@ The `q2_k` version is the smallest quantized version of the llama2 models. They
118
  Chat with the 7b chat model
119
 
120
  ```
121
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat-q2_k.gguf llama-chat.wasm
122
  ```
123
 
124
  Generate text with the 7b base model
125
 
126
  ```
127
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
128
  ```
129
 
130
  Chat with the 13b chat model
131
 
132
  ```
133
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-chat-q2_k.gguf llama-chat.wasm
134
  ```
135
 
136
  Generate text with the 13b base model
137
 
138
  ```
139
- wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-13b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
140
  ```
 
51
  curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm
52
  ```
53
 
54
+ ## Use the quantized models
55
 
56
 
57
+ The `q5_k_m` version is a quantized version of the llama2 models. They are only half of the size of the original models, and hence consume half as much VRAM, but still give high-quality inference results.
58
 
59
  Chat with the 7b chat model
60
 
61
  ```
62
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
63
  ```
64
 
65
  Generate text with the 7b base model
66
 
67
  ```
68
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
69
  ```
70
 
71
  Chat with the 13b chat model
72
 
73
  ```
74
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-q5_k_m.gguf llama-chat.wasm
75
  ```
76
 
77
  Generate text with the 13b base model
78
 
79
  ```
80
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-q5_k_m.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
81
  ```
82
 
83
+ ## Use the f16 models
 
84
 
85
 
86
+ The f16 version is the GGUF equivalent of the original llama2 models. It gives the best quality inference results but also consumes the most computing resources in both VRAM and computing time. The f16 models are also great as a basis for fine-tuning.
87
 
88
  Chat with the 7b chat model
89
 
90
  ```
91
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-f16.gguf llama-chat.wasm
92
  ```
93
 
94
  Generate text with the 7b base model
95
 
96
  ```
97
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
98
  ```
99
 
100
  Chat with the 13b chat model
101
 
102
  ```
103
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-f16.gguf llama-chat.wasm
104
  ```
105
 
106
  Generate text with the 13b base model
107
 
108
  ```
109
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-f16.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
110
  ```
111
 
112
  ## Resource constrained models
 
117
  Chat with the 7b chat model
118
 
119
  ```
120
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q2_k.gguf llama-chat.wasm
121
  ```
122
 
123
  Generate text with the 7b base model
124
 
125
  ```
126
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
127
  ```
128
 
129
  Chat with the 13b chat model
130
 
131
  ```
132
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-q2_k.gguf llama-chat.wasm
133
  ```
134
 
135
  Generate text with the 13b base model
136
 
137
  ```
138
+ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-q2_k.gguf llama-simple.wasm 'Robert Oppenheimer most important achievement is '
139
  ```