Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ language:
|
|
8 |
|
9 |
# TohokuNLP BERT-alpha 500M
|
10 |
|
11 |
-
長系列 (
|
12 |
|
13 |
## 利用方法
|
14 |
|
@@ -90,7 +90,7 @@ Whole Word Masking 単語分割器には、[vibrato](https://github.com/daac-too
|
|
90 |
| Batch Size (tokens) | 1,146,880 | 2,293,760 |
|
91 |
| Max Learning Rate | 1.0E-4 | 1.0E-4 |
|
92 |
| Min Learning Rate | 1.0E-6 | N/A |
|
93 |
-
| Learning Rate Warmup Steps |
|
94 |
| Scheduler | cosine | constant |
|
95 |
| Optimizer | AdamW | AdamW |
|
96 |
| Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
|
@@ -155,7 +155,7 @@ Whole Word Masking 単語分割器には、[vibrato](https://github.com/daac-too
|
|
155 |
|
156 |
# TohokuNLP BERT-alpha 500M
|
157 |
|
158 |
-
A Japanese [BERT](https://aclanthology.org/N19-1423/) model capable of processing long sequences (
|
159 |
|
160 |
## Usage
|
161 |
|
@@ -234,7 +234,7 @@ We only implemented Masked Language Modeling (MLM) during training, without Next
|
|
234 |
| Batch Size (tokens) | 1,146,880 | 2,293,760 |
|
235 |
| Max Learning Rate | 1.0E-4 | 1.0E-4 |
|
236 |
| Min Learning Rate | 1.0E-6 | N/A |
|
237 |
-
| Learning Rate Warmup Steps |
|
238 |
| Scheduler | cosine | constant |
|
239 |
| Optimizer | AdamW | AdamW |
|
240 |
| Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
|
|
|
8 |
|
9 |
# TohokuNLP BERT-alpha 500M
|
10 |
|
11 |
+
長系列 (4,096, 8,192 トークン) の入力を可能にした日本語 [BERT](https://aclanthology.org/N19-1423/) モデルです。
|
12 |
|
13 |
## 利用方法
|
14 |
|
|
|
90 |
| Batch Size (tokens) | 1,146,880 | 2,293,760 |
|
91 |
| Max Learning Rate | 1.0E-4 | 1.0E-4 |
|
92 |
| Min Learning Rate | 1.0E-6 | N/A |
|
93 |
+
| Learning Rate Warmup Steps | 10,000 | N/A |
|
94 |
| Scheduler | cosine | constant |
|
95 |
| Optimizer | AdamW | AdamW |
|
96 |
| Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
|
|
|
155 |
|
156 |
# TohokuNLP BERT-alpha 500M
|
157 |
|
158 |
+
A Japanese [BERT](https://aclanthology.org/N19-1423/) model capable of processing long sequences (4,096, 8,192 tokens).
|
159 |
|
160 |
## Usage
|
161 |
|
|
|
234 |
| Batch Size (tokens) | 1,146,880 | 2,293,760 |
|
235 |
| Max Learning Rate | 1.0E-4 | 1.0E-4 |
|
236 |
| Min Learning Rate | 1.0E-6 | N/A |
|
237 |
+
| Learning Rate Warmup Steps | 10,000 | N/A |
|
238 |
| Scheduler | cosine | constant |
|
239 |
| Optimizer | AdamW | AdamW |
|
240 |
| Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
|