How to specify SentencePiece tokenizer.model path?
When I run the code:
qg_tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-question-generation-ap",
cache_dir = cache_dir,
use_fast = False,
truncation = True,
return_tensor = 'pt',
# tokenizer_file = '.cache/models--mrm8488--t5-base-finetuned-question-generation-ap/snapshots/c81cbaf0ec96cc3623719e3d8b0f238da5456ca8/spiece.model',
device_map='auto')
qg_model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-question-generation-ap",
cache_dir = cache_dir,
use_fast = True,
truncation = True,
return_tensor = 'pt',
device_map='auto')
I get a error: Converting from Tiktoken failed, if a converter for SentencePiece is available, provide a model path with a SentencePiece tokenizer.model file.Currently available slow->fast convertors: ['AlbertTokenizer', 'BartTokenizer', 'BarthezTokenizer', 'BertTokenizer', 'BigBirdTokenizer', ....,
However, when I add spiece.model path as tokenizer_file, it still doesn't work.
Can anyone help me with this?