What tokenizer should I use?

#1
by Tianming - opened

Thanks for your great work.
I would like to know which tokenizer is fit for this model?

Thanks for your reply.
The tokenizer we used is Jieba for the preprocessing of the data. For the vocabulary, Roberta_zh's vocabulary is used for this model.

Using 'Bert-base-chinese' related tokenizer is available? https://huggingface.co/bert-base-chinese/tree/main

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment