Safetensors
Japanese
llama_enc
custom_code
Onely7 commited on
Commit
c9f6857
1 Parent(s): 442e792

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -8,7 +8,7 @@ language:
8
 
9
  # TohokuNLP BERT-alpha 500M
10
 
11
- 長系列 (4096, 8192 トークン) の入力を可能にした日本語 [BERT](https://aclanthology.org/N19-1423/) モデルです。
12
 
13
  ## 利用方法
14
 
@@ -90,7 +90,7 @@ Whole Word Masking 単語分割器には、[vibrato](https://github.com/daac-too
90
  | Batch Size (tokens) | 1,146,880 | 2,293,760 |
91
  | Max Learning Rate | 1.0E-4 | 1.0E-4 |
92
  | Min Learning Rate | 1.0E-6 | N/A |
93
- | Learning Rate Warmup Steps | 10000 | N/A |
94
  | Scheduler | cosine | constant |
95
  | Optimizer | AdamW | AdamW |
96
  | Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
@@ -155,7 +155,7 @@ Whole Word Masking 単語分割器には、[vibrato](https://github.com/daac-too
155
 
156
  # TohokuNLP BERT-alpha 500M
157
 
158
- A Japanese [BERT](https://aclanthology.org/N19-1423/) model capable of processing long sequences (4096, 8192 tokens).
159
 
160
  ## Usage
161
 
@@ -234,7 +234,7 @@ We only implemented Masked Language Modeling (MLM) during training, without Next
234
  | Batch Size (tokens) | 1,146,880 | 2,293,760 |
235
  | Max Learning Rate | 1.0E-4 | 1.0E-4 |
236
  | Min Learning Rate | 1.0E-6 | N/A |
237
- | Learning Rate Warmup Steps | 10000 | N/A |
238
  | Scheduler | cosine | constant |
239
  | Optimizer | AdamW | AdamW |
240
  | Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
 
8
 
9
  # TohokuNLP BERT-alpha 500M
10
 
11
+ 長系列 (4,096, 8,192 トークン) の入力を可能にした日本語 [BERT](https://aclanthology.org/N19-1423/) モデルです。
12
 
13
  ## 利用方法
14
 
 
90
  | Batch Size (tokens) | 1,146,880 | 2,293,760 |
91
  | Max Learning Rate | 1.0E-4 | 1.0E-4 |
92
  | Min Learning Rate | 1.0E-6 | N/A |
93
+ | Learning Rate Warmup Steps | 10,000 | N/A |
94
  | Scheduler | cosine | constant |
95
  | Optimizer | AdamW | AdamW |
96
  | Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
 
155
 
156
  # TohokuNLP BERT-alpha 500M
157
 
158
+ A Japanese [BERT](https://aclanthology.org/N19-1423/) model capable of processing long sequences (4,096, 8,192 tokens).
159
 
160
  ## Usage
161
 
 
234
  | Batch Size (tokens) | 1,146,880 | 2,293,760 |
235
  | Max Learning Rate | 1.0E-4 | 1.0E-4 |
236
  | Min Learning Rate | 1.0E-6 | N/A |
237
+ | Learning Rate Warmup Steps | 10,000 | N/A |
238
  | Scheduler | cosine | constant |
239
  | Optimizer | AdamW | AdamW |
240
  | Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |