Update README.md
Browse files
README.md
CHANGED
@@ -118,37 +118,23 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
118 |
|
119 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
120 |
|
121 |
-
###
|
122 |
|
123 |
-
**Note:** HF does not support uploading files larger than 50GB. Therefore I have uploaded
|
124 |
|
125 |
<details>
|
126 |
-
<summary>Click for instructions regarding
|
127 |
|
128 |
-
|
129 |
-
Please download:
|
130 |
-
* `falcon-180b-chat.Q6_K.gguf-split-a`
|
131 |
-
* `falcon-180b-chat.Q6_K.gguf-split-b`
|
132 |
-
|
133 |
-
### q8_0
|
134 |
-
Please download:
|
135 |
-
* `falcon-180b-chat.Q8_0.gguf-split-a`
|
136 |
-
* `falcon-180b-chat.Q8_0.gguf-split-b`
|
137 |
-
|
138 |
-
To join the files, do the following:
|
139 |
|
140 |
Linux and macOS:
|
141 |
```
|
142 |
-
cat falcon-180b-chat.
|
143 |
-
cat falcon-180b-chat.Q8_0.gguf-split-* > falcon-180b-chat.Q8_0.gguf && rm falcon-180b-chat.Q8_0.gguf-split-*
|
144 |
```
|
145 |
Windows command line:
|
146 |
```
|
147 |
-
COPY /B falcon-180b-chat.
|
148 |
-
del falcon-180b-chat.
|
149 |
-
|
150 |
-
COPY /B falcon-180b-chat.Q8_0.gguf-split-a + falcon-180b-chat.Q8_0.gguf-split-b falcon-180b-chat.Q8_0.gguf
|
151 |
-
del falcon-180b-chat.Q8_0.gguf-split-a falcon-180b-chat.Q8_0.gguf-split-b
|
152 |
```
|
153 |
|
154 |
</details>
|
@@ -162,7 +148,7 @@ Make sure you are using `llama.cpp` from commit [6381d4e110bd0ec02843a60bbeb8b6f
|
|
162 |
For compatibility with older versions of llama.cpp, or for any third-party libraries or clients that haven't yet updated for GGUF, please use GGML files instead.
|
163 |
|
164 |
```
|
165 |
-
./main -t 10 -ngl 32 -m falcon-180b-chat.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "User:
|
166 |
```
|
167 |
Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. If offloading all layers to GPU, set `-t 1`.
|
168 |
|
|
|
118 |
|
119 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
120 |
|
121 |
+
### All files are split and require joining after download
|
122 |
|
123 |
+
**Note:** HF does not support uploading files larger than 50GB. Therefore I have uploaded all files as split files
|
124 |
|
125 |
<details>
|
126 |
+
<summary>Click for instructions regarding joining files</summary>
|
127 |
|
128 |
+
To join the files, use the following example for each file you're interested in:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
129 |
|
130 |
Linux and macOS:
|
131 |
```
|
132 |
+
cat falcon-180b-chat.Q2_K.gguf-split-* > falcon-180b-chat.Q2_K.gguf && rm falcon-180b-chat.Q2_K.gguf-split-*
|
|
|
133 |
```
|
134 |
Windows command line:
|
135 |
```
|
136 |
+
COPY /B falcon-180b-chat.Q2_K.gguf-split-a + falcon-180b-chat.Q2_K.gguf-split-b falcon-180b-chat.Q2_K.gguf
|
137 |
+
del falcon-180b-chat.Q2_K.gguf-split-a falcon-180b-chat.Q2_K.gguf-split-b
|
|
|
|
|
|
|
138 |
```
|
139 |
|
140 |
</details>
|
|
|
148 |
For compatibility with older versions of llama.cpp, or for any third-party libraries or clients that haven't yet updated for GGUF, please use GGML files instead.
|
149 |
|
150 |
```
|
151 |
+
./main -t 10 -ngl 32 -m falcon-180b-chat.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "User: Write a story about llamas\nAssistant:"
|
152 |
```
|
153 |
Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. If offloading all layers to GPU, set `-t 1`.
|
154 |
|