AssertionError: Non-consecutive added token '荜' found. Should have index 21170 but has index 21128 in saved vocabulary.

by HokyeeJau - opened Jun 17, 2022

Jun 17, 2022

There is something wrong with the Chinese character indexes.

When I first loaded the tokenizer, according to the screenshot attached, it seemed the index is wrong.
Following the tips, I changed the index in the added_tokens.json file, but there is another wrong information for the same reason that came out.

I am wondering if I could do anything to avoid this kind of error.
Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment