How to run?

#12
by ColinS97 - opened

Hi I have tried to run this repo but I am receiving a state dict error: "RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: Missing key(s) in state_dict:"

Here is what I did:

  1. Clone this repo and clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
  2. Create a new env and install all dependencies
  3. Confirmed that GPTQ works (converted a 7B model downloaded from facebook to hf and then quantized it with gptq and ran the sample prompt.
  4. I tried the following two commands to execute the model from this repo but both failed:
    without groupsize:
    CUDA_VISIBLE_DEVICES=0 python llama_inference.py ../alpaca-30b-lora-int4 --wbits 4 --load ../alpaca-30b-lora-int4/alpaca-30b-4bit.safetensors --text "this is llama" --device=0

with groupsize:
CUDA_VISIBLE_DEVICES=0 python llama_inference.py ../alpaca-30b-lora-int4 --wbits 4 --groupsize 128 --load ../alpaca-30b-lora-int4/alpaca-30b-4bit.safetensors --text "this is llama" --device=0

I feel like I would need to convert my 30B Llama Model from Huggingface format into another format and put it into ../alpaca-30b-lora-int4 folder. So I can use it from GPTQ, but I don't know how that works with this repo and there is no info provided.

Since this is related to GPTQ and since you seem to be quantizing your own model, it’s best to ask in their repository.

elinas changed discussion status to closed

Sign up or log in to comment