Fix the tokenizer class
#53
by
sgugger
- opened
The tokenizer class is the wrong one, so the tokenizer is instantiated as PreTrainedTokenizerFast
and returns token_type_ids
when used on an input. The model cannot handle those, so the pipeline crashes for this model.
Thanks for fixing it!
Note for other readers: @ybelkada also did this change in smaller checkpoint repos (https://huggingface.co/bigscience/bloom-1b3/discussions/8)
lysandre
changed pull request status to
merged