seopbo commited on
Commit
cdf55ff
1 Parent(s): 1a1d527

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -10,16 +10,35 @@ widget:
10
  ---
11
 
12
  # LASSL roberta-ko-small
 
 
 
 
 
 
 
 
 
13
  Pretrained `roberta-ko-small` on korean language was trained by [LASSL](https://github.com/lassl/lassl) framework. Below performance was evaluated at 2021/12/15.
14
 
15
  | nsmc | klue_nli | klue_sts | korquadv1 | klue_mrc | avg |
16
  | ---- | -------- | -------- | --------- | ---- | -------- |
17
  | 87.8846 | 66.3086 | 83.8353 | 83.1780 | 42.4585 | 72.7330 |
18
 
19
- ## How to use
 
20
 
21
- ```python
22
- from transformers import AutoModel, AutoTokenizer
23
- model = AutoModel.from_pretrained("lassl/roberta-ko-small")
24
- tokenizer = AutoTokenizer.from_pretrained("lassl/roberta-ko-small")
 
 
 
 
 
 
25
  ```
 
 
 
 
10
  ---
11
 
12
  # LASSL roberta-ko-small
13
+ ## How to use
14
+
15
+ ```python
16
+ from transformers import AutoModel, AutoTokenizer
17
+ model = AutoModel.from_pretrained("lassl/roberta-ko-small")
18
+ tokenizer = AutoTokenizer.from_pretrained("lassl/roberta-ko-small")
19
+ ```
20
+
21
+ ## Evaluation
22
  Pretrained `roberta-ko-small` on korean language was trained by [LASSL](https://github.com/lassl/lassl) framework. Below performance was evaluated at 2021/12/15.
23
 
24
  | nsmc | klue_nli | klue_sts | korquadv1 | klue_mrc | avg |
25
  | ---- | -------- | -------- | --------- | ---- | -------- |
26
  | 87.8846 | 66.3086 | 83.8353 | 83.1780 | 42.4585 | 72.7330 |
27
 
28
+ ## Corpora
29
+ This model was trained from 6,860,062 examples (whose have 3,512,351,744 tokens). 6,860,062 examples are extracted from below corpora. If you want to get information for training, you should see `config.json`.
30
 
31
+ ```bash
32
+ corpora/
33
+ ├── [707M] kowiki_latest.txt
34
+ ├── [ 26M] modu_dialogue_v1.2.txt
35
+ ├── [1.3G] modu_news_v1.1.txt
36
+ ├── [9.7G] modu_news_v2.0.txt
37
+ ├── [ 15M] modu_np_v1.1.txt
38
+ ├── [1008M] modu_spoken_v1.2.txt
39
+ ├── [6.5G] modu_written_v1.0.txt
40
+ └── [413M] petition.txt
41
  ```
42
+
43
+
44
+