TheBloke commited on
Commit
14a4473
1 Parent(s): 266a900

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -18
README.md CHANGED
@@ -1,6 +1,18 @@
1
  ---
 
 
 
 
 
 
 
 
2
  inference: false
3
- license: other
 
 
 
 
4
  ---
5
 
6
  <!-- header start -->
@@ -21,7 +33,7 @@ license: other
21
 
22
  These files are GPTQ 4bit model files for [H2O's GPT-GM-OASST1-Falcon 40B v2](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2).
23
 
24
- It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
25
 
26
  ## Repositories available
27
 
@@ -29,20 +41,33 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
29
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2-GGML)
30
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2)
31
 
32
- ## How to easily download and use this model in text-generation-webui
33
 
34
- Please make sure you're using the latest version of text-generation-webui
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- 1. Click the **Model tab**.
37
- 2. Under **Download custom model or LoRA**, enter `TheBloke/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2-GPTQ`.
38
- 3. Click **Download**.
39
- 4. The model will start downloading. Once it's finished it will say "Done"
40
- 5. In the top left, click the refresh icon next to **Model**.
41
- 6. In the **Model** dropdown, choose the model you just downloaded: `h2ogpt-gm-oasst1-en-2048-falcon-40b-v2-GPTQ`
42
- 7. The model will automatically load, and is now ready for use!
43
- 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
44
- * Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
45
- 9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
 
 
46
 
47
  ## How to use this GPTQ model from Python code
48
 
@@ -55,7 +80,6 @@ Then try the following example code:
55
  ```python
56
  from transformers import AutoTokenizer, pipeline, logging
57
  from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
58
- import argparse
59
 
60
  model_name_or_path = "TheBloke/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2-GPTQ"
61
  model_basename = "gptq_model-4bit--1g"
@@ -67,15 +91,14 @@ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
67
  model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
68
  model_basename=model_basename,
69
  use_safetensors=True,
70
- trust_remote_code=False,
71
  device="cuda:0",
72
  use_triton=use_triton,
73
  quantize_config=None)
74
 
75
  # Note: check the prompt template is correct for this model.
76
  prompt = "Tell me about AI"
77
- prompt_template=f'''USER: {prompt}
78
- ASSISTANT:'''
79
 
80
  print("\n\n*** Generate:")
81
 
@@ -117,6 +140,16 @@ It was created without group_size to lower VRAM requirements, and with --act-ord
117
  * Works with text-generation-webui, including one-click-installers.
118
  * Parameters: Groupsize = -1. Act Order / desc_act = True.
119
 
 
 
 
 
 
 
 
 
 
 
120
  <!-- footer start -->
121
  ## Discord
122
 
 
1
  ---
2
+ language:
3
+ - en
4
+ library_name: transformers
5
+ tags:
6
+ - gpt
7
+ - llm
8
+ - large language model
9
+ - h2o-llmstudio
10
  inference: false
11
+ thumbnail: >-
12
+ https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
13
+ license: apache-2.0
14
+ datasets:
15
+ - OpenAssistant/oasst1
16
  ---
17
 
18
  <!-- header start -->
 
33
 
34
  These files are GPTQ 4bit model files for [H2O's GPT-GM-OASST1-Falcon 40B v2](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2).
35
 
36
+ It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
37
 
38
  ## Repositories available
39
 
 
41
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2-GGML)
42
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2)
43
 
44
+ ## Prompt template
45
 
46
+ ```
47
+ <|prompt|>prompt<|endoftext|>
48
+ <|answer|>
49
+ ```
50
+
51
+ ## EXPERIMENTAL
52
+
53
+ Please note this is an experimental GPTQ model. Support for it is currently quite limited.
54
+
55
+ It is also expected to be **VERY SLOW**. This is unavoidable at the moment, but is being looked at.
56
+
57
+ ## How to download and use this model in text-generation-webui
58
 
59
+ 1. Launch text-generation-webui
60
+ 2. Click the **Model tab**.
61
+ 3. Untick **Autoload model**
62
+ 4. Under **Download custom model or LoRA**, enter `TheBloke/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2-GPTQ`.
63
+ 5. Click **Download**.
64
+ 6. Wait until it says it's finished downloading.
65
+ 7. Click the **Refresh** icon next to **Model** in the top left.
66
+ 8. In the **Model drop-down**: choose the model you just downloaded, `TheBloke/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2-GPTQ`.
67
+ 9. Make sure **Loader** is set to **AutoGPTQ**. This model will not work with ExLlama or GPTQ-for-LLaMa.
68
+ 10. Tick **Trust Remote Code**, followed by **Save Settings**
69
+ 11. Click **Reload**.
70
+ 12. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
71
 
72
  ## How to use this GPTQ model from Python code
73
 
 
80
  ```python
81
  from transformers import AutoTokenizer, pipeline, logging
82
  from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 
83
 
84
  model_name_or_path = "TheBloke/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2-GPTQ"
85
  model_basename = "gptq_model-4bit--1g"
 
91
  model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
92
  model_basename=model_basename,
93
  use_safetensors=True,
94
+ trust_remote_code=True,
95
  device="cuda:0",
96
  use_triton=use_triton,
97
  quantize_config=None)
98
 
99
  # Note: check the prompt template is correct for this model.
100
  prompt = "Tell me about AI"
101
+ prompt_template=f'''<|prompt|>{prompt}<|endoftext|><|answer|>'''
 
102
 
103
  print("\n\n*** Generate:")
104
 
 
140
  * Works with text-generation-webui, including one-click-installers.
141
  * Parameters: Groupsize = -1. Act Order / desc_act = True.
142
 
143
+ ## FAQ
144
+
145
+ ### About `trust-remote-code`
146
+
147
+ Please be aware that this command line argument causes Python code provided by Falcon to be executed on your machine.
148
+
149
+ This code is required at the moment because Falcon is too new to be supported by Hugging Face transformers. At some point in the future transformers will support the model natively, and then `trust_remote_code` will no longer be needed.
150
+
151
+ In this repo you can see two `.py` files - these are the files that get executed. They are copied from the base repo at [Falcon-40B-Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct).
152
+
153
  <!-- footer start -->
154
  ## Discord
155