elishowk

Automatic correction of README.md metadata. Contact website@huggingface.co for any question

e960953 about 3 years ago

preview code

raw

history blame

469 Bytes

metadata

language: ja
license: cc-by-sa-4.0
datasets:
  - wikipedia

BERT base Japanese (character-level tokenization with whole word masking, jawiki-20200831)

This pretrained model is almost the same as cl-tohoku/bert-base-japanese-char-v2 but do not need fugashi or unidic_lite. The only difference is in word_tokenzer_type property (specify basic instead of mecab) in tokenizer_config.json.