TheBloke commited on
Commit
87c7b85
1 Parent(s): 4ea0190

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -24
README.md CHANGED
@@ -35,13 +35,22 @@ tags:
35
 
36
  This repo contains GGML format model files for [Meta's Llama 2 70B Chat](https://huggingface.co/meta-llama/Llama-2-70b-chat).
37
 
38
- GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as:
39
- * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a powerful GGML web UI with full GPU acceleration out of the box. Especially good for story telling.
40
- * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with GPU acceleration via the c_transformers backend.
41
- * [LM Studio](https://lmstudio.ai/), a fully featured local GUI. Supports full GPU accel on macOS. Also supports Windows, without GPU accel.
42
- * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most popular web UI. Requires extra steps to enable GPU accel via llama.cpp backend.
43
- * [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
44
- * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
 
 
 
 
 
 
 
 
 
45
 
46
  ## Repositories available
47
 
@@ -61,15 +70,11 @@ You are a helpful, respectful and honest assistant. Always answer as helpfully a
61
  <!-- compatibility_ggml start -->
62
  ## Compatibility
63
 
64
- ### Original llama.cpp quant methods: `q4_0, q4_1, q5_0, q5_1, q8_0`
65
 
66
- These are guaranteed to be compatible with any UIs, tools and libraries released since late May. They may be phased out soon, as they are largely superseded by the new k-quant methods.
67
 
68
- ### New k-quant methods: `q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`
69
-
70
- These new quantisation methods are compatible with llama.cpp as of June 6th, commit `2d43387`.
71
-
72
- They are now also compatible with recent releases of text-generation-webui, KoboldCpp, llama-cpp-python, ctransformers, rustformers and most others. For compatibility with other tools and libraries, please check their documentation.
73
 
74
  ## Explanation of the new k-quant methods
75
  <details>
@@ -109,17 +114,11 @@ Refer to the Provided Files table below to see what files use which methods, and
109
  I use the following command line; adjust for your tastes and needs:
110
 
111
  ```
112
- ./main -t 10 -ngl 32 -m llama-2-70b-chat.ggmlv3.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"
113
  ```
114
- Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
115
-
116
- Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
117
 
118
- If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
119
-
120
- ## How to run in `text-generation-webui`
121
-
122
- Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
123
 
124
  <!-- footer start -->
125
  ## Discord
@@ -145,7 +144,6 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
145
 
146
  **Patreon special mentions**: Slarti, Chadd, John Detwiler, Pieter, zynix, K, Mano Prime, ReadyPlayerEmma, Ai Maven, Leonard Tan, Edmond Seymore, Joseph William Delisle, Luke @flexchar, Fred von Graf, Viktor Bowallius, Rishabh Srivastava, Nikolai Manek, Matthew Berman, Johann-Peter Hartmann, ya boyyy, Greatston Gnanesh, Femi Adebogun, Talal Aujan, Jonathan Leane, terasurfer, David Flickinger, William Sang, Ajan Kanaga, Vadim, Artur Olbinski, Raven Klaugh, Michael Levine, Oscar Rangel, Randy H, Cory Kujawski, RoA, Dave, Alex, Alexandros Triantafyllidis, Fen Risland, Eugene Pentland, vamX, Elle, Nathan LeClaire, Khalefa Al-Ahmad, Rainer Wilmers, subjectnull, Junyu Yang, Daniel P. Andersen, SuperWojo, LangChain4j, Mandus, Kalila, Illia Dulskyi, Trenton Dambrowitz, Asp the Wyvern, Derek Yates, Jeffrey Morgan, Deep Realms, Imad Khwaja, Pyrater, Preetika Verma, biorpg, Gabriel Tamborski, Stephen Murray, Spiking Neurons AB, Iucharbius, Chris Smitley, Willem Michiel, Luke Pendergrass, Sebastain Graf, senxiiz, Will Dee, Space Cruiser, Karl Bernard, Clay Pascal, Lone Striker, transmissions 11, webtim, WelcomeToTheClub, Sam, theTransient, Pierre Kircher, chris gileta, John Villwock, Sean Connelly, Willian Hasse
147
 
148
-
149
  Thank you to all my generous patrons and donaters!
150
 
151
  <!-- footer end -->
 
35
 
36
  This repo contains GGML format model files for [Meta's Llama 2 70B Chat](https://huggingface.co/meta-llama/Llama-2-70b-chat).
37
 
38
+ ## Only compatible with latest llama.cpp
39
+
40
+ To use these files you need:
41
+
42
+ 1. llama.cpp as of [commit `e76d630`](https://github.com/ggerganov/llama.cpp/commit/e76d630df17e235e6b9ef416c45996765d2e36fb) or later.
43
+ - For users who don't want to compile from source, you can use the binaries from [release master-e76d630](https://github.com/ggerganov/llama.cpp/releases/tag/master-e76d630)
44
+ 2. to add new command line parameter `-gqa 8`
45
+
46
+ Example command:
47
+ ```
48
+ /workspace/git/llama.cpp/main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"
49
+ ```
50
+
51
+ There is no CUDA support at this time, but it should hopefully be coming soon.
52
+
53
+ There is no support in third-party UIs or Python libraries (llama-cpp-python, ctransformers) yet. That will come in due course.
54
 
55
  ## Repositories available
56
 
 
70
  <!-- compatibility_ggml start -->
71
  ## Compatibility
72
 
73
+ ### Only compatible with llama.cpp as of commit `e76d630`
74
 
75
+ Compatible with llama.cpp as of [commit `e76d630`](https://github.com/ggerganov/llama.cpp/commit/e76d630df17e235e6b9ef416c45996765d2e36fb) or later.
76
 
77
+ For a pre-compiled release, use [release master-e76d630](https://github.com/ggerganov/llama.cpp/releases/tag/master-e76d630) or later.
 
 
 
 
78
 
79
  ## Explanation of the new k-quant methods
80
  <details>
 
114
  I use the following command line; adjust for your tastes and needs:
115
 
116
  ```
117
+ ./main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"
118
  ```
119
+ Change `-t 13` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
 
 
120
 
121
+ No GPU support is possible yet, but it is coming soon.
 
 
 
 
122
 
123
  <!-- footer start -->
124
  ## Discord
 
144
 
145
  **Patreon special mentions**: Slarti, Chadd, John Detwiler, Pieter, zynix, K, Mano Prime, ReadyPlayerEmma, Ai Maven, Leonard Tan, Edmond Seymore, Joseph William Delisle, Luke @flexchar, Fred von Graf, Viktor Bowallius, Rishabh Srivastava, Nikolai Manek, Matthew Berman, Johann-Peter Hartmann, ya boyyy, Greatston Gnanesh, Femi Adebogun, Talal Aujan, Jonathan Leane, terasurfer, David Flickinger, William Sang, Ajan Kanaga, Vadim, Artur Olbinski, Raven Klaugh, Michael Levine, Oscar Rangel, Randy H, Cory Kujawski, RoA, Dave, Alex, Alexandros Triantafyllidis, Fen Risland, Eugene Pentland, vamX, Elle, Nathan LeClaire, Khalefa Al-Ahmad, Rainer Wilmers, subjectnull, Junyu Yang, Daniel P. Andersen, SuperWojo, LangChain4j, Mandus, Kalila, Illia Dulskyi, Trenton Dambrowitz, Asp the Wyvern, Derek Yates, Jeffrey Morgan, Deep Realms, Imad Khwaja, Pyrater, Preetika Verma, biorpg, Gabriel Tamborski, Stephen Murray, Spiking Neurons AB, Iucharbius, Chris Smitley, Willem Michiel, Luke Pendergrass, Sebastain Graf, senxiiz, Will Dee, Space Cruiser, Karl Bernard, Clay Pascal, Lone Striker, transmissions 11, webtim, WelcomeToTheClub, Sam, theTransient, Pierre Kircher, chris gileta, John Villwock, Sean Connelly, Willian Hasse
146
 
 
147
  Thank you to all my generous patrons and donaters!
148
 
149
  <!-- footer end -->