Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
7413c35
·
1 Parent(s): f31023b

Update for Transformers GPTQ support

Browse files
README.md CHANGED
@@ -11,17 +11,20 @@ datasets:
11
  ---
12
 
13
  <!-- header start -->
14
- <div style="width: 100%;">
15
- <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
 
16
  </div>
17
  <div style="display: flex; justify-content: space-between; width: 100%;">
18
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
19
- <p><a href="https://discord.gg/theblokeai">Chat & support: my new Discord server</a></p>
20
  </div>
21
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
22
- <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
23
  </div>
24
  </div>
 
 
25
  <!-- header end -->
26
 
27
  # Pankaj Mathur's Orca Mini 7B GPTQ
@@ -150,6 +153,7 @@ It was created with group_size 128 to increase inference accuracy, but without -
150
  * Parameters: Groupsize = 128. Act Order / desc_act = False.
151
 
152
  <!-- footer start -->
 
153
  ## Discord
154
 
155
  For further support, and discussions on these models and AI in general, join us at:
@@ -169,12 +173,15 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
169
  * Patreon: https://patreon.com/TheBlokeAI
170
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
171
 
172
- **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
 
 
173
 
174
- **Patreon special mentions**: Pyrater, WelcomeToTheClub, Kalila, Mano Prime, Trenton Dambrowitz, Spiking Neurons AB, Pierre Kircher, Fen Risland, Kevin Schuppel, Luke, Rainer Wilmers, vamX, Gabriel Puliatti, Alex , Karl Bernard, Ajan Kanaga, Talal Aujan, Space Cruiser, ya boyyy, biorpg, Johann-Peter Hartmann, Asp the Wyvern, Ai Maven, Ghost , Preetika Verma, Nikolai Manek, trip7s trip, John Detwiler, Fred von Graf, Artur Olbinski, subjectnull, John Villwock, Junyu Yang, Rod A, Lone Striker, Chris McCloskey, Iucharbius , Matthew Berman, Illia Dulskyi, Khalefa Al-Ahmad, Imad Khwaja, chris gileta, Willem Michiel, Greatston Gnanesh, Derek Yates, K, Alps Aficionado, Oscar Rangel, David Flickinger, Luke Pendergrass, Deep Realms, Eugene Pentland, Cory Kujawski, terasurfer , Jonathan Leane, senxiiz, Joseph William Delisle, Sean Connelly, webtim, zynix , Nathan LeClaire.
175
 
176
  Thank you to all my generous patrons and donaters!
177
 
 
 
178
  <!-- footer end -->
179
 
180
  # Original model card: Pankaj Mathur's Orca Mini 7B
@@ -234,12 +241,12 @@ model = LlamaForCausalLM.from_pretrained(
234
 
235
  #generate text function
236
  def generate_text(system, instruction, input=None):
237
-
238
  if input:
239
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
240
  else:
241
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"
242
-
243
  tokens = tokenizer.encode(prompt)
244
  tokens = torch.LongTensor(tokens).unsqueeze(0)
245
  tokens = tokens.to('cuda')
@@ -249,14 +256,14 @@ def generate_text(system, instruction, input=None):
249
  length = len(tokens[0])
250
  with torch.no_grad():
251
  rest = model.generate(
252
- input_ids=tokens,
253
- max_length=length+instance['generate_len'],
254
- use_cache=True,
255
- do_sample=True,
256
  top_p=instance['top_p'],
257
  temperature=instance['temperature'],
258
  top_k=instance['top_k']
259
- )
260
  output = rest[0][length:]
261
  string = tokenizer.decode(output, skip_special_tokens=True)
262
  return f'[!] Response: {string}'
 
11
  ---
12
 
13
  <!-- header start -->
14
+ <!-- 200823 -->
15
+ <div style="width: auto; margin-left: auto; margin-right: auto">
16
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
17
  </div>
18
  <div style="display: flex; justify-content: space-between; width: 100%;">
19
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
20
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
21
  </div>
22
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
23
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
24
  </div>
25
  </div>
26
+ <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
27
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
28
  <!-- header end -->
29
 
30
  # Pankaj Mathur's Orca Mini 7B GPTQ
 
153
  * Parameters: Groupsize = 128. Act Order / desc_act = False.
154
 
155
  <!-- footer start -->
156
+ <!-- 200823 -->
157
  ## Discord
158
 
159
  For further support, and discussions on these models and AI in general, join us at:
 
173
  * Patreon: https://patreon.com/TheBlokeAI
174
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
175
 
176
+ **Special thanks to**: Aemon Algiz.
177
+
178
+ **Patreon special mentions**: Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter
179
 
 
180
 
181
  Thank you to all my generous patrons and donaters!
182
 
183
+ And thank you again to a16z for their generous grant.
184
+
185
  <!-- footer end -->
186
 
187
  # Original model card: Pankaj Mathur's Orca Mini 7B
 
241
 
242
  #generate text function
243
  def generate_text(system, instruction, input=None):
244
+
245
  if input:
246
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
247
  else:
248
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"
249
+
250
  tokens = tokenizer.encode(prompt)
251
  tokens = torch.LongTensor(tokens).unsqueeze(0)
252
  tokens = tokens.to('cuda')
 
256
  length = len(tokens[0])
257
  with torch.no_grad():
258
  rest = model.generate(
259
+ input_ids=tokens,
260
+ max_length=length+instance['generate_len'],
261
+ use_cache=True,
262
+ do_sample=True,
263
  top_p=instance['top_p'],
264
  temperature=instance['temperature'],
265
  top_k=instance['top_k']
266
+ )
267
  output = rest[0][length:]
268
  string = tokenizer.decode(output, skip_special_tokens=True)
269
  return f'[!] Response: {string}'
config.json CHANGED
@@ -1,23 +1,33 @@
1
  {
2
- "_name_or_path": "openlm-research/open_llama_7b",
3
- "architectures": [
4
- "LlamaForCausalLM"
5
- ],
6
- "bos_token_id": 1,
7
- "eos_token_id": 2,
8
- "hidden_act": "silu",
9
- "hidden_size": 4096,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 11008,
12
- "max_position_embeddings": 2048,
13
- "model_type": "llama",
14
- "num_attention_heads": 32,
15
- "num_hidden_layers": 32,
16
- "pad_token_id": 0,
17
- "rms_norm_eps": 1e-06,
18
- "tie_word_embeddings": false,
19
- "torch_dtype": "float32",
20
- "transformers_version": "4.29.1",
21
- "use_cache": true,
22
- "vocab_size": 32000
 
 
 
 
 
 
 
 
 
 
23
  }
 
1
  {
2
+ "_name_or_path": "openlm-research/open_llama_7b",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 4096,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 11008,
12
+ "max_position_embeddings": 2048,
13
+ "model_type": "llama",
14
+ "num_attention_heads": 32,
15
+ "num_hidden_layers": 32,
16
+ "pad_token_id": 0,
17
+ "rms_norm_eps": 1e-06,
18
+ "tie_word_embeddings": false,
19
+ "torch_dtype": "float32",
20
+ "transformers_version": "4.29.1",
21
+ "use_cache": true,
22
+ "vocab_size": 32000,
23
+ "quantization_config": {
24
+ "bits": 4,
25
+ "group_size": 128,
26
+ "damp_percent": 0.01,
27
+ "desc_act": false,
28
+ "sym": true,
29
+ "true_sequential": true,
30
+ "model_file_base_name": "model",
31
+ "quant_method": "gptq"
32
+ }
33
  }
orca-mini-7b-GPTQ-4bit-128g.no-act.order.safetensors → model.safetensors RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4053265966b39eca75b7d159a6e355815b71f0c01e43df2a50d7ba9bca4e113f
3
- size 4520875496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96ddd10779a0505dd15aaca2463a332ea5b5048d351a2ecfc3ec09af2e4d278f
3
+ size 4520875552
quantize_config.json CHANGED
@@ -1,8 +1,9 @@
1
  {
2
- "bits": 4,
3
- "group_size": 128,
4
- "damp_percent": 0.01,
5
- "desc_act": false,
6
- "sym": true,
7
- "true_sequential": true
 
8
  }
 
1
  {
2
+ "bits": 4,
3
+ "group_size": 128,
4
+ "damp_percent": 0.01,
5
+ "desc_act": false,
6
+ "sym": true,
7
+ "true_sequential": true,
8
+ "model_file_base_name": "model"
9
  }