Update distilbert_japanese_tokenizer.py

#4
by liwii - opened

As discussed in the community, current tokenizer code does not work with transformers>=4.34, this is because the tokenizer refactoring introduced in that version.

With this change, PreTrainedTokenizer.__init__() starts to access get_vocab(), so self.subword_tokenizer_type needs to be initialized before super().__init__() of DistilBertJapaneseTokenizer.

This issue is already fixed in transformers with 2da8853. This PR basically follows that change.

Confirmed it works with my repository forked from line-corporation/line-distilbert-base-japanese.

Looks good to me! Thank you!

kajyuuen changed pull request status to merged

Sign up or log in to comment