Transformers
English
falcon
text-generation-inference
TheBloke commited on
Commit
b93b1bf
1 Parent(s): 2ba684d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -25,9 +25,12 @@ license: apache-2.0
25
 
26
  These files are **experimental** GGML format model files for [Falcon 40B Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct).
27
 
28
- These GGML files will **not** work in llama.cpp, and at the time of writing they will not work with any UI or library. They cannot be used from Python code.
29
 
30
- They can be used with a new fork of llama.cpp that adds Falcon GGML support: [cmp-nc/ggllm.cpp](https://github.com/cmp-nct/ggllm.cpp)
 
 
 
31
 
32
  ## Repositories available
33
 
@@ -35,11 +38,15 @@ They can be used with a new fork of llama.cpp that adds Falcon GGML support: [cm
35
  * [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-3bit-GPTQ)
36
  * [2, 3, 4, 5, 6, 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-GGML)
37
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/tiiuae/falcon-40b-instruct)
38
-
39
  <!-- compatibility_ggml start -->
40
  ## Compatibility
41
 
42
- To build cmp-nct's fork of llama.cpp with Falcon 40B support plus preliminary CUDA acceleration, please try the following steps:
 
 
 
 
43
 
44
  ```
45
  git clone https://github.com/cmp-nct/ggllm.cpp
@@ -51,7 +58,7 @@ Compiling on Windows: developer cmp-nct notes: 'I personally compile it using VS
51
 
52
  Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
53
  ```
54
- bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon40b-instruct.ggmlv3.q3_K_S.bin -p "What is a falcon?\n### Response:"
55
  ```
56
 
57
  You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.
 
25
 
26
  These files are **experimental** GGML format model files for [Falcon 40B Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct).
27
 
28
+ They cannot be used with text-generation-webui, llama.cpp, or KoboldCpp at this time.
29
 
30
+ They can be used with:
31
+ * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui).
32
+ * The ctransformers Python library, which includes LangChain support: [ctransformers](https://github.com/marella/ctransformers).
33
+ * A new fork of llama.cpp that introduced this new Falcon GGML support: [cmp-nc/ggllm.cpp](https://github.com/cmp-nct/ggllm.cpp).
34
 
35
  ## Repositories available
36
 
 
38
  * [3-bit GPTQ model for GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-3bit-GPTQ)
39
  * [2, 3, 4, 5, 6, 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/falcon-40b-instruct-GGML)
40
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/tiiuae/falcon-40b-instruct)
41
+
42
  <!-- compatibility_ggml start -->
43
  ## Compatibility
44
 
45
+ The recommended UI for these GGMLs is [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui). Preliminary CUDA GPU acceleration is provided.
46
+
47
+ For use from Python code, use [ctransformers](https://github.com/marella/ctransformers). Again, with preliminary CUDA GPU acceleration
48
+
49
+ Or to build cmp-nct's fork of llama.cpp with Falcon 7B support plus preliminary CUDA acceleration, please try the following steps:
50
 
51
  ```
52
  git clone https://github.com/cmp-nct/ggllm.cpp
 
58
 
59
  Once compiled you can then use `bin/falcon_main` just like you would use llama.cpp. For example:
60
  ```
61
+ bin/falcon_main -t 8 -ngl 100 -b 1 -m falcon7b-instruct.ggmlv3.q4_0.bin -p "What is a falcon?\n### Response:"
62
  ```
63
 
64
  You can specify `-ngl 100` regardles of your VRAM, as it will automatically detect how much VRAM is available to be used.