Jul 24, 2023

•

edited Jul 24, 2023

I am facing an error while loading the model i have install the auto_gptq and transformer libraries but while loading commands runs it gives me the quantized_config.json file not found error below i have mentioned my code and exact error im getting

Code:
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-4bit-128g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

Error:
FileNotFoundError: [Errno 2] No such file or directory: 'TheBloke/Llama-2-13B-chat-GPTQ/quantize_config.json'

Osamarafique998

Jul 24, 2023

This is the complete error i am getting

FileNotFoundError Traceback (most recent call last)
Cell In[10], line 11
7 use_triton = False
9 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
---> 11 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
12 model_basename=model_basename,
13 use_safetensors=True,
14 trust_remote_code=True,
15 device="cuda:0",
16 use_triton=use_triton,
17 quantize_config=None)
19 """
20 To download from a specific branch, use the revision parameter, as in this example:
21
(...)
28 quantize_config=None)
29 """
31 prompt = "Tell me about AI"

File /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/auto.py:63, in AutoGPTQForCausalLM.from_quantized(cls, save_dir, device, use_safetensors, use_triton, max_memory, device_map, quantize_config, model_basename, trust_remote_code)
49 @classmethod
50 def from_quantized(
51 cls,
(...)
60 trust_remote_code: bool = False
61 ) -> BaseGPTQForCausalLM:
62 model_type = check_and_get_model_type(save_dir)
---> 63 return GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized(
64 save_dir=save_dir,
65 device=device,
66 use_safetensors=use_safetensors,
67 use_triton=use_triton,
68 max_memory=max_memory,
69 device_map=device_map,
70 quantize_config=quantize_config,
71 model_basename=model_basename,
72 trust_remote_code=trust_remote_code
73 )

File /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py:501, in BaseGPTQForCausalLM.from_quantized(cls, save_dir, device, use_safetensors, use_triton, max_memory, device_map, quantize_config, model_basename, trust_remote_code)
498 raise TypeError(f"{config.model_type} isn't supported yet.")
500 if quantize_config is None:
--> 501 quantize_config = BaseQuantizeConfig.from_pretrained(save_dir)
503 if model_basename is None:
504 model_basename = f"gptq_model-{quantize_config.bits}bit-{quantize_config.group_size}g"

File /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py:51, in BaseQuantizeConfig.from_pretrained(cls, save_dir)
49 @classmethod
50 def from_pretrained(cls, save_dir: str):
---> 51 with open(join(save_dir, "quantize_config.json"), "r", encoding="utf-8") as f:
52 return cls(**json.load(f))

FileNotFoundError: [Errno 2] No such file or directory: 'TheBloke/Llama-2-13B-chat-GPTQ/quantize_config.json'

TheBloke

Owner Jul 24, 2023

•

edited Jul 24, 2023

I really don't know what's wrong. I ran the code you showed and it works fine, and as you can see there is definitely a quantize_config.json in this repo.

Here's the output I get when I run the code you showed above:

No error.

It must be some kind of environment problem on your system. Maybe it is failing to download the files correctly. It's not a problem with this model, or I think with AutoGPTQ.

Try testing with a normal transformers model, like with the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name_or_path = "facebook/opt-125m"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(model_name_or_path)

print(f"Model config is: {model.config}")

You should see output similar to this:

Osamarafique998

Jul 24, 2023

This "facebook/opt-125m" model loaded successfully without any issue.

But i test for this model again it gives same error although you were right i can see the quantize_config.json in the repo so, maybe it occurs due to some environment issue.
I am running the LLAMA-2 model on Runpod with 48GB GPU.
What GPU configurations are you using to run that model?

Osamarafique998

Jul 24, 2023

Is there a way to clone this huggingface repo and then test the model. If there is one can you share the code snippet for loading the model from the cloned repo?

TheBloke

Owner Jul 24, 2023

Yes, that's easy to do. Here's example code. Make sure to set local_folder to the folder you want to download to.

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
from huggingface_hub import snapshot_download

model_name = "TheBloke/Llama-2-13B-chat-GPTQ"
local_folder = "/workspace/test-llama-2"

snapshot_download(repo_id=model_name, local_dir=local_folder, local_dir_use_symlinks=False)

model_basename = "gptq_model-4bit-128g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(local_folder, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(local_folder,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

input_ids = tokenizer("Llamas are", return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

Output:

Osamarafique998

Jul 24, 2023

Thanks a lot. I will try that!

Osamarafique998

Jul 24, 2023

I use the exact code you provide but now it's giving another error

TypeError Traceback (most recent call last)
in <cell line: 16>()
14 tokenizer = AutoTokenizer.from_pretrained("/workspace/test-llama-2", use_fast=True)
15
---> 16 model = AutoGPTQForCausalLM.from_quantized("/workspace/test-llama-2",
17 model_basename=model_basename,
18 use_safetensors=True,

2 frames
/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py in from_pretrained(cls, save_dir)
50 def from_pretrained(cls, save_dir: str):
51 with open(join(save_dir, "quantize_config.json"), "r", encoding="utf-8") as f:
---> 52 return cls(**json.load(f))
53
54 def to_dict(self):

TypeError: BaseQuantizeConfig.init() got an unexpected keyword argument 'model_name_or_path'

TheBloke

Owner Jul 24, 2023

Ah, I think you must be using an old version of AutoGPTQ.

Update to 0.2.2 or 0.3.0. But there are some bugs in 0.3.0 so for now I recommend using 0.2.2 instead:

pip3 uninstall -y auto-gptq
GITHUB_ACTIONS=true pip3 install auto-gptq==0.2.2

Osamarafique998

Jul 24, 2023

Why it is showing couldn't find a version that satisfies the requirement although it shows that it is a valid version?

TheBloke

Owner Jul 24, 2023

Run

CUDA_VERSION="" GITHUB_ACTIONS=true pip3 install auto-gptq==0.2.2

Osamarafique998

Jul 24, 2023

Thank you so much for your assistance.
Now its working for both.

RichardScottOZ

Aug 21, 2023

This was working fine for some time, last few days I get:-

Traceback (most recent call last):
  File "/home/ubuntu/k2-setup/pycode/asx_titles_llama.py", line 18, in <module>
    model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/micromamba/envs/transormers/lib/python3.11/site-packages/auto_gptq/modeling/auto.py", line 105, in from_quantized
    return quant_func(
           ^^^^^^^^^^^
  File "/home/ubuntu/micromamba/envs/transormers/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 768, in from_quantized
    raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
FileNotFoundError: Could not find model in TheBloke/Llama-2-13B-chat-GPTQ

AutoGPTQ is 0.3.2

RichardScottOZ

Aug 21, 2023

Either this or your test snapshot test above gives the same error, interestingly

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import torch
import time
import os
import json
import pandas as pd
from datetime import datetime, timedelta
from get_announcements import get_datestamp, get_datestamp_for_sorting

model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-4bit-128g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        #device="cuda:0",
        device_map=0,
        #device="cpu",
        use_triton=use_triton,
        quantize_config=None)

IainRatherThanIan

Aug 22, 2023

I'm having this problem with multiple models that were working last week. Even a local copy does the same and I've tried a new env. Anyone else?

TheBloke

Owner Aug 22, 2023

•

edited Aug 22, 2023

I recently updated all my GPTQ models for Transformers compatibility (coming very soon). All GPTQ models have been renamed to model.safetensors.

Please check the README again and you'll see that the model_basename line is now: model_basename = "model".

This applies for all branches in all GPTQ models.

Or in fact you can simply leave out model_basename now:

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

Because the model_basename is now also configured in quantize_config.json.

In the next 24 - 48 hours I will be updating all my GPTQ READMEs to explain this in more detail, and provide example code for loading GPTQ models directly from Transformers. I am waiting for the new Transformers and Optimum releases to happen. Transformers just released an hour ago, and Optimum will be releasing later today or early tomorrow.

IainRatherThanIan

Aug 22, 2023

Thank you for the quick response and all the great work you do.

bourbe

Aug 23, 2023

Hello,

@TheBloke : Do you have updta about the new parametrization please ?

This is my current code where the model doen't upload. What does I need to change please ?

!unset CUDA_VERSION && pip3 install auto-gptq==0.2.2
!pip3 install transformers

#######################################################################################################

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-4bit-128g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

#######################################################################################################

article = """Helping Your Teen Lose Weight

Following healthy habits are the essential key to teen weight loss. Without such a habit, teenagers may find it difficult to maintain their healthy weight.
The way today's food consumption is being looked at in this country, healthy eating has surely been put at the wayside. Teenagers of today really have a myriad of delicious food choices to eat. But sad to say, most of them belong to the unhealthy food group. It is now easier for teenagers today to get fat because of the convenience brought about by fast food.

Parents today live in a very busy world where time is spent more on work. Such parents may not have the time to prepare food and sometimes must rely on the nearest pizza or hamburger place to provide the nourishment for their children. But this should not be. Fastfood is considered to be one of the reasons why most teenagers are getting fatter. Fastfood is considered junk food since they are not able to supply all the nourishment that growing kids need. But fast food can really be fattening with the great amounts of fat and carbs that they contain. It is a bad choice if you wish to help your child stay at a healthy weight as he or she grows up.

Your concern to see to it that your teenagers grow up to be healthy and fit individuals is the first step in keeping their weight down. Always bear in mind that teenage obesity is a dangerous and a growing problem in this country. But you can do something about it. You can make effective use of your concern about your teenager's weight by putting it into action. You can help show your teen the way by following a practical plan for success. There's no easy way for teen weight loss. The most important thing that you can do is letting your teenager adopt healthy habits that can last a lifetime. Here are some tips:

Start with a heart-to-heart talk.
If your see that your teen is getting overweight, chances are, he or she is also concerned about the excess weight. Aside from bringing in lifelong health risks such as high blood pressure and diabetes, the social and emotional consequences of being overweight can have a devastating effect on your teenager. Talk to your teenager about it. Try to offer support and gentle understanding and make him or her verbally aware that you really are concerned. Try also to add in a willingness to help your teen take control of the weight problem that he or she is facing.
As much as possible, resist looking for quick fixes.
Make your teen realize that losing and maintaining an ideal weight is a lifetime commitment. Encouraging fad diets may rob your growing teen essential nutrients essential to his or her continuing development. Buying weight-loss pills for your teenager and other quick fixes won't be able to address the root of the weight problem. The effects of such quick fixes are often short-lived and you teen may likely balloon back. What you should be able to teach is adopting a lifelong healthy habit. Without a permanent change in unhealthy habits, any weight loss program will only remain a temporary fix.
Promote and encourage doing more calorie-burning activities.
Just like adults, teens also require about an hour of physical activity everyday. But that doesn't mean sixty solid minutes of pure gut-wrenching activity. You can plan shorter, repeated bursts of activity throughout the day that not only can help burn calories, but also become an enjoyable, fun and worthwhile affair. Sports and hiking can be probable options.
"""

#######################################################################################################

system_message = system_message = "You are an assistant dedicated to providing valuable, respectful, and honest support. Your task is to assist with the creation and conversion of content, such as converting text into a Markdown blog article format."

#prompt = """Convert this to a DETAILLED MARKDOWN blog article OF 2500 WORDS WITH AT LEAST WITH MANY SUBTITLES. SUBTITLES WITH TAGS ##, ###, #### IN THE ARTICLE ARE MANDATORY. CREATE A TITLE WITH A # TAG AT THE BEGINNING FOR THE ARTICLE. MARKDOWN FORMATING IS MANDATORY:{article}""".format(article=article)
prompt = """Convert this to a DETAILLED MARKDOWN blog article OF 2500 WORDS WITH AT LEAST WITH MANY SECTIONS. SECTIONS WITH TAGS ##, ###, #### IN THE ARTICLE ARE MANDATORY. CREATE A TITLE WITH A # TAG AT THE BEGINNING FOR THE ARTICLE. MARKDOWN FORMATING IS MANDATORY:{article}""".format(article=article)

prompt_template=f'''[INST] <>
{system_message}
<>

{prompt} [/INST]'''

#######################################################################################################

Prevent printing spurious transformers error when using pipeline with AutoGPTQ

logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")

pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=3000,
temperature=0.7,
do_sample=True,
top_p=0.95,
repetition_penalty=1.15
)

final_md = pipe(prompt_template)[0]['generated_text'].split('[/INST]')[-1].lstrip()
print(final_md)

title = "final_md_file"
invalid_chars = r'/:*?<>|"'
translation_table = title.maketrans('', '', invalid_chars)
title = title.translate(translation_table)

file_name = f"{title}.md"
with open(file_name, "w", encoding="utf-8") as file:
file.write(final_md)

TheBloke

Owner Aug 23, 2023

•

edited Aug 23, 2023

I recently updated all my GPTQ models for Transformers compatibility (coming very soon). All GPTQ models have been renamed to model.safetensors.

Please check the README again and you'll see that the model_basename line is now: model_basename = "model".

This applies for all branches in all GPTQ models.

Or in fact you can simply leave out model_basename now:

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

Because the model_basename is now also configured in quantize_config.json.

In the next 24 - 48 hours I will be updating all my GPTQ READMEs to explain this in more detail, and provide example code for loading GPTQ models directly from Transformers. I am waiting for the new Transformers and Optimum releases to happen. Transformers just released yesterday, and Optimum will be releasing some time today.

bourbe

Aug 24, 2023

Hello @TheBloke ,

Thank you for your answer, the model upload well now

I have a cpu i7 machine, is there any way to make it works on this machine please ? My goal was to be able to run freely this model and I use runpod machine to run it and finally the final cost is higher than the same task in chatgpt so if there is a way to run this model on my cpu machine I will be very happy

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

TheBloke
/

Llama-2-13B-chat-GPTQ

File not found error while loading model

Prevent printing spurious transformers error when using pipeline with AutoGPTQ

logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")