sho-takase commited on
Commit
4327e37
1 Parent(s): ab033e8

Fix readme

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -52,7 +52,7 @@ for t in text:
52
  Our training corpus consists of the Japanese portions of publicly available corpus such as C4, CC-100, and Oscar.
53
  We also incorporated the Web texts crawled by in-house system.
54
  The total size of our training corpus is about 650 GB.
55
- The trained model achieves 8.57 perplexity on the internal validation sets of Japanese C4,
56
 
57
  ## Tokenization
58
  We use a sentencepiece tokenizer with a unigram language model and byte-fallback.
 
52
  Our training corpus consists of the Japanese portions of publicly available corpus such as C4, CC-100, and Oscar.
53
  We also incorporated the Web texts crawled by in-house system.
54
  The total size of our training corpus is about 650 GB.
55
+ The trained model achieves 8.57 perplexity on the internal validation sets of Japanese C4.
56
 
57
  ## Tokenization
58
  We use a sentencepiece tokenizer with a unigram language model and byte-fallback.