Tokenization issue
#2
by
sh0416
- opened
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/incoder-1B")
assert "from ." == tokenizer.decode(tokenizer("from .")["input_ids"], skip_special_tokens=True, cleanup_tokenization_spaces=False)
Raise an assertion error. I suspect that the encoding process remove space.. How to handle it?
Oh, there is a typo in option.. clean_up_tokenization_spaces=False
works great. Sorry for the confusion.
sh0416
changed discussion status to
closed