Pipeline tokenizer does not get initialized when loading from string

by connorboyle - opened

If I try loading a pipeline for this model:

>>> pipe = pipeline(model="microsoft/Phi-3-mini-128k-instruct", trust_remote_code=True)

the tokenizer is initialized to None and the pipeline cannot be called without crashing.

>>> type(pipe.tokenizer)
>>> pipe("Hello, world")
TypeError: 'NoneType' object is not callable

however, if the tokenizer is manually initialized, the pipeline can be called as normal:

>>> pipe = pipeline(model="microsoft/Phi-3-mini-128k-instruct", tokenizer=AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct"), trust_remote_code=True)
>>> pipe("Hello, world")
[{'generated_text': 'Hello, world!\n\n### Exercises\n\n1. Write a Python program'}]
connorboyle changed discussion title from Pipeline does not work when loading from string to Pipeline tokenizer does not initialized when loading from string
Microsoft org

It's possibly due to trust_remote_code=True. Should work as soon as HF releases their 4.41.0 version.

connorboyle changed discussion title from Pipeline tokenizer does not initialized when loading from string to Pipeline tokenizer does not get initialized when loading from string
nguyenbh changed discussion status to closed

