Is possible to make the model only return the response without the prompt?

#61
by mduran159 - opened

With the example of the code you posted I can only make it return the entire prompt with the model response at the end, how it's usual in this models. But when you use a pipeline you can avoid all that stuff, is there a way to make it work like an LLM with pipeline to make it return only the model response/answer?

I'm not sure what's conventional as this is the most I've used transformers, but you can always strip it based on the special tokens right?

    start_token = "<|start_header_id|>assistant<|end_header_id|>"
    end_token = "<|eot_id|>"

    start_index = processor_output.find(start_token) + len(start_token)
    end_index = processor_output.rfind(end_token)

    if start_index != -1 and end_index != -1 and start_index < end_index:
        content = processor_output[start_index:end_index].strip()

I know skip_special_tokenscan be flagged while decoding, but they seem to provide good structure.

Sign up or log in to comment