Fast Tokenization for Phi3-small

#29
by mfajcik - opened

Dear authors,
I was wondering why this model only provides basic tokenizer and not fast tokenizer.

>>> y = AutoTokenizer.from_pretrained("microsoft/Phi-3-small-128k-instruct",     trust_remote_code=True, use_fast=True)

>>>y.is_fast
False

Unfortunately, this makes model unusable in some cases, which require token offset_mapping, for reversible tokenization (as is the case of my research currently).

Is this itentional? Different phi tokenizers are fast, e.g., "microsoft/Phi-3-mini-4k-instruct".

Thank you for any advice.
Best,
Martin

mfajcik changed discussion title from Fast Tokenization for Phi to Fast Tokenization for Phi3-small

Sign up or log in to comment