why TheBloke/guanaco-65B-GPTQ run slow even on 80GB GPU

#17
by balajivantari - opened

the code i tried is given below...

even tried with langchain still it's too slow,,,,

can you tell me how to make this model work faster. ........

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/guanaco-65B-GPTQ"
model_basename = "Guanaco-65B-GPTQ-4bit.act-order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

prompt = "who is the first president of US"
prompt_template=f'''### Instruction: {prompt}

Response:'''

pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

Thankyou;

hlo bro, can you please tell me how to make it fast

Hey Bloke, can please give a clarification

It may be because the AutoGPTQ CUDA extension hasn't built. Show the full output you see when running the script.

Screenshot (251).png
code i tried and the output i got, it took around 10 min+
Screenshot (250).png
can you plz tell me how to fix this

Please show me the output of:

!python -c 'import torch ; print(torch.__version__) ; print(torch.cuda.is_available())'
!nvidia-smi

Screenshot (253).png
yes man here is that

I can't see any obvious problems there. However you're using the latest development version of AutoGPTQ, which has not been as well tested.

Please use version 0.2.2:

!pip uninstall auto-gptq
!pip install auto-gptq==0.2.2

Run that, then run the following test script:

import torch
import autogptq_cuda

print(torch.__version__)
print(torch.cuda.is_available())
print(autogptq_cuda)

And show me output.

Screenshot (254).png
here its bro

i think problem is with installation am ryt, can you tell me the crt procedure of installation of autogptq

Try building from source:

!git clone https://github.com/PanQiWei/AutoGPTQ
!cd AutoGPTQ && git checkout v0.2.1 && pip install .

Then test again

is it solved? 解决了吗 ?TheBloke 你本地实验的运行速度大概是多少?一秒几个token

it is solved

Sign up or log in to comment