Transformer Pipeline

#26
by francescoyoubiquo - opened

Loading Gemma 2b.it model with this code:

model_version = 2
model_id = f"/kaggle/input/gemma/transformers/2b-it/{model_version}"
model_config = f"/kaggle/input/gemma/transformers/2b-it/{model_version}/config.json"

tokenizer_id = f"/kaggle/input/gemma/transformers/2b-it/{model_version}"
tokenizer_config = f"/kaggle/input/gemma/transformers/2b-it/{model_version}/tokenizer_config.json"

model_config = AutoConfig.from_pretrained(model_config)
model = AutoModelForCausalLM.from_pretrained(model_id, config=model_config, device_map='auto')

tokenizer_config = AutoConfig.from_pretrained(tokenizer_config)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, config=tokenizer_config, device_map='auto', return_tensors="pt")

Executing the generation as follow:

input_text = "Write a python function to print all elements of a list."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=16)
print(tokenizer.decode(outputs[0]))

Some text is generated. But creating a transformers.pipeline as follow, the only text in output is the input text.

query_pipeline = transformers.pipeline(
task="text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
device_map="auto",
framework="pt",
)

input_text = "Write a python function to print all elements of a list."
result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
)

print(f"Result: {result}")

This is the output:
Result: [{'generated_text': 'Write a python function to print all elements of a list.'}]

This procedure is correct or there are some mistakes?

Instead, when the pipeline is applying the chat-template in this way before executing the pipeline generates some text:

chat = [
{ "role": "user", "content": input_text },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

But the text is also generated creatina a pipeline of type "conversational" and passing a chat like this:
chat = [
{ "role": "user", "content": input_text },
]

There's a problem with the TextGenerationPipeline?

Even I am struggling with this

Google org

Is this using the right chat template and control tokens under the hood?

I had the same issue the generated_text is the same as input. I found a way to fix this.

Modify the code:

result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
)

to:

result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
add_special_tokens=True
)

Google org

ah good to know! cc @osanseviero in case we should specify this somewhere?

To utilize pipeline the chat template must be used. Using pipeline without chat template does not generate any new tokens.

Interesting, cc @ArthurZ @Rocketknight1 do you think there is something we need to upstream in transformers pipeline?

But shouldn't the text-generation pipeline produce new tokens as for all the other models?
Also for gemma-7b-it it sometimes generates tokens for me.

Sign up or log in to comment