Update README.md
Browse files
README.md
CHANGED
@@ -54,9 +54,9 @@ Once compiled you can then use `bin/falcon_main` just like you would use llama.c
|
|
54 |
bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon40b-instruct.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
|
55 |
```
|
56 |
|
57 |
-
You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available
|
58 |
|
59 |
-
Adjust `-t 8` according to what performs best on your system. Do not exceed the number of physical CPU cores you have.
|
60 |
|
61 |
`-b 1` reduces batch size to 1. This slightly lowers prompt evaluation time, but frees up VRAM to load more of the model on to your GPU. If you find prompt evaluation too slow and have enough spare VRAM, you can remove this parameter.
|
62 |
|
|
|
54 |
bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon40b-instruct.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
|
55 |
```
|
56 |
|
57 |
+
You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
|
58 |
|
59 |
+
Adjust `-t 8` (the number of CPU cores to use) according to what performs best on your system. Do not exceed the number of physical CPU cores you have.
|
60 |
|
61 |
`-b 1` reduces batch size to 1. This slightly lowers prompt evaluation time, but frees up VRAM to load more of the model on to your GPU. If you find prompt evaluation too slow and have enough spare VRAM, you can remove this parameter.
|
62 |
|