tianyuz commited on
Commit
2b2c88d
1 Parent(s): f3cdc9a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  language: ja
3
- thumbnail: https://github.com/rinnakk/japanese-gpt2/blob/master/rinna.png
4
  tags:
5
  - ja
6
  - japanese
@@ -11,9 +11,9 @@ tags:
11
  license: mit
12
  datasets:
13
  - cc100
14
- - wikipedia
15
- widget:
16
- - text: "生命、宇宙、そして万物についての究極の疑問の答えは"
17
  ---
18
 
19
  # japanese-gpt-neox-small
@@ -40,7 +40,7 @@ model = GPTNeoXForCausalLM.from_pretrained("rinna/japanese-gpt-neox-small")
40
  A 12-layer, 768-hidden-size transformer-based language model.
41
 
42
  # Training
43
- The model was trained on [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz), [Japanese C4](https://huggingface.co/datasets/c4), and [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) to optimize a traditional language modelling objective.
44
 
45
  # Tokenization
46
  The model uses a [sentencepiece](https://github.com/google/sentencepiece)-based tokenizer.
 
1
  ---
2
  language: ja
3
+ thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
4
  tags:
5
  - ja
6
  - japanese
 
11
  license: mit
12
  datasets:
13
  - cc100
14
+ - Wikipedia
15
+ - mc4
16
+ inference: false
17
  ---
18
 
19
  # japanese-gpt-neox-small
 
40
  A 12-layer, 768-hidden-size transformer-based language model.
41
 
42
  # Training
43
+ The model was trained on [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz), [Japanese C4](https://huggingface.co/datasets/mc4), and [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) to optimize a traditional language modelling objective.
44
 
45
  # Tokenization
46
  The model uses a [sentencepiece](https://github.com/google/sentencepiece)-based tokenizer.