Update for Transformers GPTQ support

Browse files

Files changed (4) hide show

README.md +21 -14
config.json +32 -22
selfee-13b-GPTQ-4bit-128g.no-act.order.safetensors → model.safetensors +2 -2
quantize_config.json +7 -6

README.md CHANGED Viewed

@@ -4,17 +4,20 @@ license: other
 ---
 <!-- header start -->
-<div style="width: 100%;">
-    <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
 </div>
 <div style="display: flex; justify-content: space-between; width: 100%;">
     <div style="display: flex; flex-direction: column; align-items: flex-start;">
-        <p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
     </div>
     <div style="display: flex; flex-direction: column; align-items: flex-end;">
-        <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
     </div>
 </div>
 <!-- header end -->
 # Kaist AI's Selfee 13B GPTQ
@@ -72,11 +75,12 @@ It was created with group_size 128 to increase inference accuracy, but without -
   * Parameters: Groupsize = 128. Act Order / desc_act = False.
 <!-- footer start -->
 ## Discord
 For further support, and discussions on these models and AI in general, join us at:
-[TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
 ## Thanks, and how to contribute.
@@ -91,12 +95,15 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
 * Patreon: https://patreon.com/TheBlokeAI
 * Ko-Fi: https://ko-fi.com/TheBlokeAI
-**Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
-**Patreon special mentions**: Derek Yates, Sean Connelly, Luke, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, trip7s trip, Jonathan Leane, Talal Aujan, Artur Olbinski, Cory Kujawski, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Johann-Peter Hartmann.
 Thank you to all my generous patrons and donaters!
 <!-- footer end -->
 # Original model card: Kaist AI's Selfee 13B
@@ -140,7 +147,7 @@ For other datsets, we do not need special data collection method.
 To train our model with high-quality instructions and answer pairs, we utilized data augmentation using OpenAI API calls. The process involved three steps. <br>
 Firstly, we collected various instructions from multiple fields and fed them to ChatGPT to generate answers. <br>
 Secondly, we gathered feedback on the generated answer by querying ChatGPT again and asked it to determine if the initial answer required any revision. <br>
-Thirdly, if a revision was necessary, we passed the instruction, initial answer, and feedback pair to ChatGPT to generate a revised answer and its feedback pair.
 We repeated the process until we received feedback that required no further revision or hit the maximum iteration. However, due to the token limitation of the ChatGPT API, we had to truncate some instances that needed more than 4096 tokens while augmenting.<br>
 You can see the details with command [here](data_augmentation/README.md).<br>
 *We provide the whole dataset after collection and augmentation using huggingface([code](data_collection/download_train.py)), so you can either use the code or follow our [data merging step](outputs/README.md) to replicate the training dataset. Feel free to use any of them!
@@ -202,17 +209,17 @@ python inference/apply_delta.py --path_raw {path_to_llama_7b} --path_tuned /ckpt
 Because SelFee is trained to generate iterative feedback and revisions until the response is satisfying, it automatically generates iterative feedback and revisions on a single forward pass. The model autonomously decides when to stop generating revisions based on the feedback. If the feedback chain ends with sequences like `Revision is not needed.`, the model autonomously terminates generation. <br>
-For autonomous inference mode,
 ```
-python inference/inference.py --model-path "ckpt/selfee-7b" --model-id "selfee" --question-file "evaluation/template/question.jsonl" --answer-file "evaluation/answer/selfee_7b_autonomous.jsonl"
 ```
 <b>Revision Enforce Inference Mode</b><br>
-We observed that increasing the minimum number of required revisions corresponds to a corresponding increase in performance. To enforce revisions, we automatically replace sequences such as `Revision is not needed.` into `Revision is needed.` during self-feedback generation. Because SelFee is trained to generate `Revision {index}:` after the sequence of `Revision is needed.`, the model would continually revise the answer.
-For revision enforce inference mode, use the `max-num-revision` argument.
 ```
 python inference/inference.py --model-path "ckpt/selfee-7b" --model-id "selfee" --question-file "evaluation/template/question.jsonl" --answer-file "evaluation/answer/selfee_7b_enforce_3_revision.jsonl" --max-num-revision 3
@@ -231,7 +238,7 @@ First, you need to get your API key to get access to the GPT-4 API.
 export OPENAI_API_KEYS={personal_key}
 ```
-To compare the performance of a generation result (for example, located on `evaluation/answer/file_A.jsonl`) with another generation result (located on `evaluation/anwer/file_B.jsonl`),
 ```
@@ -244,7 +251,7 @@ To mitigate the positional bias of GPT-4 model, we apply a bidirectional evaluat
 python evaluation/gpt4_automatic_evaluation.py -q evaluation/template/question.jsonl -a evaluation/answer/file_B.jsonl evaluation/answer/file_A.jsonl -p evaluation/template/prompt.jsonl -r evaluation/template/reviewer.jsonl -o evaluation/review/B_vs_A.jsonl
 ```
-## Limitations
 Similar to other LLaMA-finetuned models, SelFee also make some mistakes especially for math, reasoning, factuality, and coding tasks. Although our performance outperforms ChatGPT on Vicuna setting, the evaluation setting contains some limitations in terms of comprehension (limited to 80 queries), inconsistency, and unreliability. Therefore, further research for a better evaluation setting is needed. Please take these claims with a grain of salt.
 ## Online demo

 ---
 <!-- header start -->
+<!-- 200823 -->
+<div style="width: auto; margin-left: auto; margin-right: auto">
+<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
 </div>
 <div style="display: flex; justify-content: space-between; width: 100%;">
     <div style="display: flex; flex-direction: column; align-items: flex-start;">
+        <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
     </div>
     <div style="display: flex; flex-direction: column; align-items: flex-end;">
+        <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
     </div>
 </div>
+<div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
+<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
 <!-- header end -->
 # Kaist AI's Selfee 13B GPTQ
   * Parameters: Groupsize = 128. Act Order / desc_act = False.
 <!-- footer start -->
+<!-- 200823 -->
 ## Discord
 For further support, and discussions on these models and AI in general, join us at:
+[TheBloke AI's Discord server](https://discord.gg/theblokeai)
 ## Thanks, and how to contribute.
 * Patreon: https://patreon.com/TheBlokeAI
 * Ko-Fi: https://ko-fi.com/TheBlokeAI
+**Special thanks to**: Aemon Algiz.
+**Patreon special mentions**: Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter
 Thank you to all my generous patrons and donaters!
+And thank you again to a16z for their generous grant.
 <!-- footer end -->
 # Original model card: Kaist AI's Selfee 13B
 To train our model with high-quality instructions and answer pairs, we utilized data augmentation using OpenAI API calls. The process involved three steps. <br>
 Firstly, we collected various instructions from multiple fields and fed them to ChatGPT to generate answers. <br>
 Secondly, we gathered feedback on the generated answer by querying ChatGPT again and asked it to determine if the initial answer required any revision. <br>
+Thirdly, if a revision was necessary, we passed the instruction, initial answer, and feedback pair to ChatGPT to generate a revised answer and its feedback pair.
 We repeated the process until we received feedback that required no further revision or hit the maximum iteration. However, due to the token limitation of the ChatGPT API, we had to truncate some instances that needed more than 4096 tokens while augmenting.<br>
 You can see the details with command [here](data_augmentation/README.md).<br>
 *We provide the whole dataset after collection and augmentation using huggingface([code](data_collection/download_train.py)), so you can either use the code or follow our [data merging step](outputs/README.md) to replicate the training dataset. Feel free to use any of them!
 Because SelFee is trained to generate iterative feedback and revisions until the response is satisfying, it automatically generates iterative feedback and revisions on a single forward pass. The model autonomously decides when to stop generating revisions based on the feedback. If the feedback chain ends with sequences like `Revision is not needed.`, the model autonomously terminates generation. <br>
+For autonomous inference mode,
 ```
+python inference/inference.py --model-path "ckpt/selfee-7b" --model-id "selfee" --question-file "evaluation/template/question.jsonl" --answer-file "evaluation/answer/selfee_7b_autonomous.jsonl"
 ```
 <b>Revision Enforce Inference Mode</b><br>
+We observed that increasing the minimum number of required revisions corresponds to a corresponding increase in performance. To enforce revisions, we automatically replace sequences such as `Revision is not needed.` into `Revision is needed.` during self-feedback generation. Because SelFee is trained to generate `Revision {index}:` after the sequence of `Revision is needed.`, the model would continually revise the answer.
+For revision enforce inference mode, use the `max-num-revision` argument.
 ```
 python inference/inference.py --model-path "ckpt/selfee-7b" --model-id "selfee" --question-file "evaluation/template/question.jsonl" --answer-file "evaluation/answer/selfee_7b_enforce_3_revision.jsonl" --max-num-revision 3
 export OPENAI_API_KEYS={personal_key}
 ```
+To compare the performance of a generation result (for example, located on `evaluation/answer/file_A.jsonl`) with another generation result (located on `evaluation/anwer/file_B.jsonl`),
 ```
 python evaluation/gpt4_automatic_evaluation.py -q evaluation/template/question.jsonl -a evaluation/answer/file_B.jsonl evaluation/answer/file_A.jsonl -p evaluation/template/prompt.jsonl -r evaluation/template/reviewer.jsonl -o evaluation/review/B_vs_A.jsonl
 ```
+## Limitations
 Similar to other LLaMA-finetuned models, SelFee also make some mistakes especially for math, reasoning, factuality, and coding tasks. Although our performance outperforms ChatGPT on Vicuna setting, the evaluation setting contains some limitations in terms of comprehension (limited to 80 queries), inconsistency, and unreliability. Therefore, further research for a better evaluation setting is needed. Please take these claims with a grain of salt.
 ## Online demo

config.json CHANGED Viewed

@@ -1,24 +1,34 @@
 {
-    "_name_or_path": "/workspace/process/selfee-13b/delta",
-    "architectures": [
-        "LlamaForCausalLM"
-    ],
-    "bos_token_id": 1,
-    "eos_token_id": 2,
-    "hidden_act": "silu",
-    "hidden_size": 5120,
-    "initializer_range": 0.02,
-    "intermediate_size": 13824,
-    "max_position_embeddings": 2048,
-    "max_sequence_length": 2048,
-    "model_type": "llama",
-    "num_attention_heads": 40,
-    "num_hidden_layers": 40,
-    "pad_token_id": 0,
-    "rms_norm_eps": 1e-06,
-    "tie_word_embeddings": false,
-    "torch_dtype": "float32",
-    "transformers_version": "4.30.0.dev0",
-    "use_cache": true,
-    "vocab_size": 32001
 }

 {
+  "_name_or_path": "/workspace/process/selfee-13b/delta",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 13824,
+  "max_position_embeddings": 2048,
+  "max_sequence_length": 2048,
+  "model_type": "llama",
+  "num_attention_heads": 40,
+  "num_hidden_layers": 40,
+  "pad_token_id": 0,
+  "rms_norm_eps": 1e-06,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.30.0.dev0",
+  "use_cache": true,
+  "vocab_size": 32001,
+  "quantization_config": {
+    "bits": 4,
+    "group_size": 128,
+    "damp_percent": 0.01,
+    "desc_act": false,
+    "sym": true,
+    "true_sequential": true,
+    "model_file_base_name": "model",
+    "quant_method": "gptq"
+  }
 }

selfee-13b-GPTQ-4bit-128g.no-act.order.safetensors → model.safetensors RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3b483ce1e0cbf2140e764897d2a8fd5290be2c8fa72e63051060fcf0538e5ba0
-size 8111029176

 version https://git-lfs.github.com/spec/v1
+oid sha256:c0dcd0a34cde469eb7971cf76305642da4b4f52f600010fab0b14d3789214d28
+size 8111029232

quantize_config.json CHANGED Viewed

@@ -1,8 +1,9 @@
 {
-    "bits": 4,
-    "group_size": 128,
-    "damp_percent": 0.01,
-    "desc_act": false,
-    "sym": true,
-    "true_sequential": true
 }

 {
+  "bits": 4,
+  "group_size": 128,
+  "damp_percent": 0.01,
+  "desc_act": false,
+  "sym": true,
+  "true_sequential": true,
+  "model_file_base_name": "model"
 }