issues on runpod

#5
by matthewberman - opened

I tried following the instructions but keep running into an issue on runpod:

Traceback (most recent call last):
File “/workspace/text-generation-webui/server.py”, line 100, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “/workspace/text-generation-webui/modules/models.py”, line 125, in load_model
from modules.GPTQ_loader import load_quantized
File “/workspace/text-generation-webui/modules/GPTQ_loader.py”, line 14, in
import llama_inference_offload
ModuleNotFoundError: No module named ‘llama_inference_offload’

This is after setting the parameters/variables and then clicking "reload the model"

How do I fix this?

This means GPTQ-for-LLaMa is not installed and therefore you can't run GPTQs until it's installed.

Are you using Runpod's own template? That doesn't have GPTQ available. I have one that does, which is all ready to go for both GPTQ and GGML with CUDA accel: https://runpod.io/gsc?template=qk29nkmbfr&ref=eexqfacd

Amazing, going to try that now.

Will that also work with the falcon model? I was running into the trust_remote_code=True error with that on runpod template

Yeah it probably would actually. I added AutoGPTQ to the template recently. I don't recall if I've specifically tested it but it looks like it has all it needs.

Just bear in mind that Falcon is horribly slow at the moment! Hopefully that will improve in time. It's very much an experimental GPTQ atm.

@TheBloke would you mind sharing your dockerfile?

Sign up or log in to comment