batterydata
commited on
Commit
•
e675b6d
1
Parent(s):
76be348
Update README.md
Browse files
README.md
CHANGED
@@ -30,6 +30,33 @@ This way, the model learns an inner representation of the English language that
|
|
30 |
useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
|
31 |
classifier using the features produced by the BERT model as inputs.
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
## Intended uses & limitations
|
34 |
|
35 |
You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
|
@@ -72,33 +99,6 @@ encoded_input = tokenizer(text, return_tensors='tf')
|
|
72 |
output = model(encoded_input)
|
73 |
```
|
74 |
|
75 |
-
## Training data
|
76 |
-
|
77 |
-
The BatteryOnlyBERT model was pretrained on the full text of battery papers only. The paper corpus contains a total of 400,366 battery research papers that are published from 2000 to June 2021, from the publishers Royal Society of Chemistry (RSC), Elsevier, and Springer. The list of DOIs can be found at [Github](https://github.com/ShuHuang/batterybert/blob/main/corpus.txt).
|
78 |
-
|
79 |
-
## Training procedure
|
80 |
-
|
81 |
-
### Preprocessing
|
82 |
-
|
83 |
-
The texts are lowercased and tokenized using WordPiece and a vocabulary size of 28,996. The inputs of the model are
|
84 |
-
then of the form:
|
85 |
-
|
86 |
-
```
|
87 |
-
[CLS] Sentence A [SEP] Sentence B [SEP]
|
88 |
-
```
|
89 |
-
|
90 |
-
The details of the masking procedure for each sentence are the following:
|
91 |
-
- 15% of the tokens are masked.
|
92 |
-
- In 80% of the cases, the masked tokens are replaced by `[MASK]`.
|
93 |
-
- In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
|
94 |
-
- In the 10% remaining cases, the masked tokens are left as is.
|
95 |
-
|
96 |
-
### Pretraining
|
97 |
-
|
98 |
-
|
99 |
-
The model was trained on 8 NVIDIA DGX A100 GPUs for 1,500,000 steps with a batch size of 256. The sequence length was limited to 512 tokens. The optimizer used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
|
100 |
-
learning rate warmup for 10,000 steps and linear decay of the learning rate after.
|
101 |
-
|
102 |
## Evaluation results
|
103 |
|
104 |
Final loss: 1.0614.
|
|
|
30 |
useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
|
31 |
classifier using the features produced by the BERT model as inputs.
|
32 |
|
33 |
+
## Training data
|
34 |
+
|
35 |
+
The BatteryOnlyBERT model was pretrained on the full text of battery papers only. The paper corpus contains 1.87B tokens form a total of 400,366 battery research papers that are published from 2000 to June 2021, from the publishers Royal Society of Chemistry (RSC), Elsevier, and Springer. The list of DOIs can be found at [Github](https://github.com/ShuHuang/batterybert/blob/main/corpus.txt).
|
36 |
+
|
37 |
+
## Training procedure
|
38 |
+
|
39 |
+
### Preprocessing
|
40 |
+
|
41 |
+
The texts are lowercased and tokenized using WordPiece and a vocabulary size of 28,996. The inputs of the model are
|
42 |
+
then of the form:
|
43 |
+
|
44 |
+
```
|
45 |
+
[CLS] Sentence A [SEP] Sentence B [SEP]
|
46 |
+
```
|
47 |
+
|
48 |
+
The details of the masking procedure for each sentence are the following:
|
49 |
+
- 15% of the tokens are masked.
|
50 |
+
- In 80% of the cases, the masked tokens are replaced by `[MASK]`.
|
51 |
+
- In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
|
52 |
+
- In the 10% remaining cases, the masked tokens are left as is.
|
53 |
+
|
54 |
+
### Pretraining
|
55 |
+
|
56 |
+
|
57 |
+
The model was trained on 8 NVIDIA DGX A100 GPUs for 1,500,000 steps with a batch size of 256. The sequence length was limited to 512 tokens. The optimizer used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
|
58 |
+
learning rate warmup for 10,000 steps and linear decay of the learning rate after.
|
59 |
+
|
60 |
## Intended uses & limitations
|
61 |
|
62 |
You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
|
|
|
99 |
output = model(encoded_input)
|
100 |
```
|
101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
## Evaluation results
|
103 |
|
104 |
Final loss: 1.0614.
|