transformers pipeline doesnt output text

#2
by neuralworm - opened

I tried to run the example in the model card, and the generation works, but the transformers pipeline generation doesnt work.
I have the latest transformers and auto_gptq

Code:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-13B-German-Assistant-v4-GPTQ"
use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        use_safetensors=True,
        trust_remote_code=False,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

"""

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        use_safetensors=True,
        trust_remote_code=False,
        device="cuda:0",
        quantize_config=None)
"""

prompt = "Wo steht der Eifelturm?"
prompt_template=f'''### User: {prompt}
### Assistant:
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
)

print(pipe(prompt_template)[0]['generated_text'])</code>

Output:

*** Generate:
<s> ### User: Wo steht der Eifelturm?
### Assistant:

Der Eifelturm steht in Paris, Frankreich. Er ist ein berühmter Aussichtsturm und ein Wahrzeichen der Stadt. Der Turm wurde im 19. Jahrhundert erbaut und ist seitdem ein beliebter Ort für Touristen und Einheimische.
</s>
*** Pipeline:
### User: Wo steht der Eifelturm?
### Assistant:

Sign up or log in to comment