TheBloke/guanaco-65B-GPTQ · issues on runpod

May 29, 2023

I tried following the instructions but keep running into an issue on runpod:

Traceback (most recent call last):
File “/workspace/text-generation-webui/server.py”, line 100, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “/workspace/text-generation-webui/modules/models.py”, line 125, in load_model
from modules.GPTQ_loader import load_quantized
File “/workspace/text-generation-webui/modules/GPTQ_loader.py”, line 14, in
import llama_inference_offload
ModuleNotFoundError: No module named ‘llama_inference_offload’

This is after setting the parameters/variables and then clicking "reload the model"

How do I fix this?

TheBloke

Owner May 29, 2023

This means GPTQ-for-LLaMa is not installed and therefore you can't run GPTQs until it's installed.

Are you using Runpod's own template? That doesn't have GPTQ available. I have one that does, which is all ready to go for both GPTQ and GGML with CUDA accel: https://runpod.io/gsc?template=qk29nkmbfr&ref=eexqfacd

matthewberman

May 29, 2023

Amazing, going to try that now.

Will that also work with the falcon model? I was running into the trust_remote_code=True error with that on runpod template

TheBloke

Owner May 29, 2023

Yeah it probably would actually. I added AutoGPTQ to the template recently. I don't recall if I've specifically tested it but it looks like it has all it needs.

Just bear in mind that Falcon is horribly slow at the moment! Hopefully that will improve in time. It's very much an experimental GPTQ atm.

olafgeibig

Jun 2, 2023

@TheBloke would you mind sharing your dockerfile?