nlp-waseda
/

roberta-large-japanese-seq512-with-auto-jumanpp

Inference Endpoints

Model card Files Files and versions Community

conan1024hao commited on Oct 15, 2022

Commit

3d66092

•

1 Parent(s): c799aa4

update readme

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ You can fine-tune this model on downstream tasks.
 ## Tokenization
-`BertJapaneseTokenizer` now supports automatic tokenization for [Juman++](https://github.com/ku-nlp/jumanpp). However, if your dataset is large, you may take a long time since `BertJapaneseTokenizer` still does not supoort fast tokenization. You can still do the Juman++ tokenization by your self and use the old model [nlp-waseda/roberta-large-japanese](https://huggingface.co/nlp-waseda/roberta-large-japanese).
 Juman++ 2.0.0-rc3 was used for pretraining. Each word is tokenized into tokens by [sentencepiece](https://github.com/google/sentencepiece).

 ## Tokenization
+`BertJapaneseTokenizer` now supports automatic tokenization for [Juman++](https://github.com/ku-nlp/jumanpp). However, if your dataset is large, you may take a long time since `BertJapaneseTokenizer` still does not supoort fast tokenization. You can still do the Juman++ tokenization by your self and use the old model [nlp-waseda/roberta-large-japanese-seq512](https://huggingface.co/nlp-waseda/roberta-large-japanese-seq512).
 Juman++ 2.0.0-rc3 was used for pretraining. Each word is tokenized into tokens by [sentencepiece](https://github.com/google/sentencepiece).