bartowski commited on
Commit
5907fda
1 Parent(s): c920517

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -67
README.md CHANGED
@@ -185,41 +185,40 @@ extra_gated_fields:
185
  By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
186
  extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the [Meta Privacy Policy](https://www.facebook.com/privacy/policy/).
187
  extra_gated_button_content: Submit
188
- widget:
189
- - example_title: Hello
190
- messages:
191
- - role: user
192
- content: Hey my name is Julien! How are you?
193
- - example_title: Winter holidays
194
- messages:
195
- - role: system
196
- content: You are a helpful and honest assistant. Please, respond concisely and truthfully.
197
- - role: user
198
- content: Can you recommend a good destination for Winter holidays?
199
- - example_title: Programming assistant
200
- messages:
201
- - role: system
202
- content: You are a helpful and honest code and programming assistant. Please, respond concisely and truthfully.
203
- - role: user
204
- content: Write a function that computes the nth fibonacci number.
205
- inference:
206
- parameters:
207
- max_new_tokens: 300
208
- stop:
209
- - <|end_of_text|>
210
- - <|eot_id|>
211
  quantized_by: bartowski
 
 
 
 
 
 
 
 
 
212
  ---
213
 
214
- ## Llamacpp imatrix Quantizations of Meta-Llama-3-8B-Instruct
215
 
216
- Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> commit <a href="https://github.com/ggerganov/llama.cpp/commit/ffe666572f98a686b17a2cd1dbf4c0a982e5ac0a">ffe6665</a> for quantization.
217
 
218
- Original model: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
 
 
219
 
220
- All quants made using imatrix option with dataset provided by Kalomaze [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
 
 
 
 
221
 
222
- ## Prompt format
 
 
 
 
 
 
 
223
 
224
  ```
225
  <|begin_of_text|><|start_header_id|>system<|end_header_id|>
@@ -228,58 +227,42 @@ All quants made using imatrix option with dataset provided by Kalomaze [here](ht
228
 
229
  {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
230
 
231
-
232
  ```
233
 
234
- ## Download a file (not the whole branch) from below:
 
 
 
 
 
 
 
 
235
 
236
- | Filename | Quant type | File Size | Description |
237
- | -------- | ---------- | --------- | ----------- |
238
- | [Meta-Llama-3-8B-Instruct-Q8_0.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q8_0.gguf) | Q8_0 | 8.54GB | Extremely high quality, generally unneeded but max available quant. |
239
- | [Meta-Llama-3-8B-Instruct-Q6_K.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q6_K.gguf) | Q6_K | 6.59GB | Very high quality, near perfect, *recommended*. |
240
- | [Meta-Llama-3-8B-Instruct-Q5_K_M.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf) | Q5_K_M | 5.73GB | High quality, *recommended*. |
241
- | [Meta-Llama-3-8B-Instruct-Q5_K_S.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q5_K_S.gguf) | Q5_K_S | 5.59GB | High quality, *recommended*. |
242
- | [Meta-Llama-3-8B-Instruct-Q4_K_M.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf) | Q4_K_M | 4.92GB | Good quality, uses about 4.83 bits per weight, *recommended*. |
243
- | [Meta-Llama-3-8B-Instruct-Q4_K_S.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q4_K_S.gguf) | Q4_K_S | 4.69GB | Slightly lower quality with more space savings, *recommended*. |
244
- | [Meta-Llama-3-8B-Instruct-IQ4_NL.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ4_NL.gguf) | IQ4_NL | 4.67GB | Decent quality, slightly smaller than Q4_K_S with similar performance *recommended*. |
245
- | [Meta-Llama-3-8B-Instruct-IQ4_XS.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ4_XS.gguf) | IQ4_XS | 4.44GB | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
246
- | [Meta-Llama-3-8B-Instruct-Q3_K_L.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q3_K_L.gguf) | Q3_K_L | 4.32GB | Lower quality but usable, good for low RAM availability. |
247
- | [Meta-Llama-3-8B-Instruct-Q3_K_M.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q3_K_M.gguf) | Q3_K_M | 4.01GB | Even lower quality. |
248
- | [Meta-Llama-3-8B-Instruct-IQ3_M.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ3_M.gguf) | IQ3_M | 3.78GB | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
249
- | [Meta-Llama-3-8B-Instruct-IQ3_S.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ3_S.gguf) | IQ3_S | 3.68GB | Lower quality, new method with decent performance, recommended over Q3_K_S quant, same size with better performance. |
250
- | [Meta-Llama-3-8B-Instruct-Q3_K_S.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q3_K_S.gguf) | Q3_K_S | 3.66GB | Low quality, not recommended. |
251
- | [Meta-Llama-3-8B-Instruct-IQ3_XS.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ3_XS.gguf) | IQ3_XS | 3.51GB | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
252
- | [Meta-Llama-3-8B-Instruct-IQ3_XXS.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ3_XXS.gguf) | IQ3_XXS | 3.27GB | Lower quality, new method with decent performance, comparable to Q3 quants. |
253
- | [Meta-Llama-3-8B-Instruct-Q2_K.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-Q2_K.gguf) | Q2_K | 3.17GB | Very low quality but surprisingly usable. |
254
- | [Meta-Llama-3-8B-Instruct-IQ2_M.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ2_M.gguf) | IQ2_M | 2.94GB | Very low quality, uses SOTA techniques to also be surprisingly usable. |
255
- | [Meta-Llama-3-8B-Instruct-IQ2_S.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ2_S.gguf) | IQ2_S | 2.75GB | Very low quality, uses SOTA techniques to be usable. |
256
- | [Meta-Llama-3-8B-Instruct-IQ2_XS.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ2_XS.gguf) | IQ2_XS | 2.60GB | Very low quality, uses SOTA techniques to be usable. |
257
- | [Meta-Llama-3-8B-Instruct-IQ2_XXS.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ2_XXS.gguf) | IQ2_XXS | 2.39GB | Lower quality, uses SOTA techniques to be usable. |
258
- | [Meta-Llama-3-8B-Instruct-IQ1_M.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ1_M.gguf) | IQ1_M | 2.16GB | Extremely low quality, *not* recommended. |
259
- | [Meta-Llama-3-8B-Instruct-IQ1_S.gguf](https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct-IQ1_S.gguf) | IQ1_S | 2.01GB | Extremely low quality, *not* recommended. |
260
 
261
- ## Which file should I choose?
262
 
263
- A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
264
 
265
- The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
266
 
267
- If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
268
 
269
- If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
270
 
271
- Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
272
 
273
- If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
274
 
275
- If you want to get more into the weeds, you can check out this extremely useful feature chart:
276
 
277
- [llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
278
 
279
- But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
280
 
281
- These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
282
 
283
- The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
284
 
285
- Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
 
185
  By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
186
  extra_gated_description: The information you provide will be collected, stored, processed and shared in accordance with the [Meta Privacy Policy](https://www.facebook.com/privacy/policy/).
187
  extra_gated_button_content: Submit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
188
  quantized_by: bartowski
189
+ lm_studio:
190
+ param_count: 8b
191
+ use_case: general
192
+ release_date: 18-04-2024
193
+ model_creator: meta-llama
194
+ prompt_template: Llama 3
195
+ system_prompt: You are a helpful AI assistant.
196
+ base_model: llama
197
+ original_repo: meta-llama/Meta-Llama-3-8B-Instruct
198
  ---
199
 
200
+ ## 💫 Community Model> Llama 3 8B Instruct by Meta
201
 
202
+ *👾 [LM Studio](https://lmstudio.ai) Community models highlights program. Highlighting new & noteworthy models by the community. Join the conversation on [Discord](https://discord.gg/aPQfnNkxGC)*.
203
 
204
+ **Model creator:** [meta-llama](https://huggingface.co/meta-llama)<br>
205
+ **Original model**: [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)<br>
206
+ **GGUF quantization:** provided by [bartowski](https://huggingface.co/bartowski) based on `llama.cpp` release [b2777](https://github.com/ggerganov/llama.cpp/releases/tag/b2777)<br>
207
 
208
+ ## Model Summary:
209
+ Llama 3 represents a huge update to the Llama family of models. This model is the 8B parameter instruction tuned model, meaning it's small, fast, and tuned for following instructions.<br>
210
+ This model is very happy to follow the given system prompt, so use this to your advantage to get the behavior you desire.<br>
211
+ Llama 3 excels at all the general usage situations, including multi turn conversations, general world knowledge, and coding.<br>
212
+ This 8B model exceeds the performance of Llama 2's 70B model, showing that the performance is far greater than the previous iteration.
213
 
214
+ This model is made with the BPE fixes from llama.cpp
215
+
216
+
217
+ ## Prompt Template:
218
+
219
+ Choose the 'Llama 3' preset in your LM Studio.
220
+
221
+ Under the hood, the model will see a prompt that's formatted like so:
222
 
223
  ```
224
  <|begin_of_text|><|start_header_id|>system<|end_header_id|>
 
227
 
228
  {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
229
 
 
230
  ```
231
 
232
+ ## Use case and examples
233
+
234
+ Llama 3 should be great for anything you throw at it. Try it with conversations, coding, and just all around general inquiries.
235
+
236
+ ## Creative conversations
237
+
238
+ Using a system prompt of `You are a pirate chatbot who always responds in pirate speak!`
239
+
240
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/PYIhzOZtKVSHEUq24u3ll.png)
241
 
242
+ ## General knowledge
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
243
 
244
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/3XDcR9e10CxcdVhmeco_W.png)
245
 
246
+ ## Coding
247
 
248
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6435718aaaef013d1aec3b8b/l-AHfv39hXG9IPzKqIBpv.png)
249
 
250
+ ## Technical Details
251
 
252
+ Llama 3 was trained on over 15T tokens from a massively diverse range of subjects and languages, and includes 4 times more code than Llama 2.
253
 
254
+ This model also features Grouped Attention Query (GQA) so that memory usage scales nicely over large contexts.
255
 
256
+ Instruction fine tuning was performed with a combination of supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct policy optimization (DPO).
257
 
258
+ Check out their blog post for more information [here](https://ai.meta.com/blog/meta-llama-3/)
259
 
260
+ ## Special thanks
261
 
262
+ 🙏 Special thanks to [Georgi Gerganov](https://github.com/ggerganov) and the whole team working on [llama.cpp](https://github.com/ggerganov/llama.cpp/) for making all of this possible.
263
 
264
+ 🙏 Special thanks to [Kalomaze](https://github.com/kalomaze) for his dataset (linked [here](https://github.com/ggerganov/llama.cpp/discussions/5263)) that was used for calculating the imatrix for these quants, which improves the overall quality!
265
 
266
+ ## Disclaimers
267
 
268
+ LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio.