AssertionError: Non-consecutive added token '荜' found. Should have index 21170 but has index 21128 in saved vocabulary.

#1
by HokyeeJau - opened

There is something wrong with the Chinese character indexes.

When I first loaded the tokenizer, according to the screenshot attached, it seemed the index is wrong.
Following the tips, I changed the index in the added_tokens.json file, but there is another wrong information for the same reason that came out.

I am wondering if I could do anything to avoid this kind of error.
Thank you.
image.png

Sign up or log in to comment