satyaalmasian
commited on
Commit
·
a2ee042
1
Parent(s):
6feb3d6
Update README.md
Browse files
README.md
CHANGED
@@ -44,7 +44,7 @@ For Pretraining :1 million weakly annotated samples from heideltime. The samples
|
|
44 |
Fine-tunning: [Tempeval-3](https://www.cs.york.ac.uk/semeval-2013/task1/index.php%3Fid=data.html), Wikiwars, Tweets datasets. For the correct data versions please refer to our [repository](https://github.com/satya77/Transformer_Temporal_Tagger).
|
45 |
|
46 |
#Training procedure
|
47 |
-
The model is pre-trained on the weakly labeled data for $3$ epochs on the train set, from publicly available checkpoints on huggingface (`
|
48 |
Additionally, we use 2000 warmup steps.
|
49 |
We fine-tune the 3 benchmark data for 8 epochs with 5 different random seeds, this version of the model is the only seed=4.
|
50 |
The batch size and the learning rate is the same as the pre-training setup, but the warm-up steps are reduced to 100.
|
|
|
44 |
Fine-tunning: [Tempeval-3](https://www.cs.york.ac.uk/semeval-2013/task1/index.php%3Fid=data.html), Wikiwars, Tweets datasets. For the correct data versions please refer to our [repository](https://github.com/satya77/Transformer_Temporal_Tagger).
|
45 |
|
46 |
#Training procedure
|
47 |
+
The model is pre-trained on the weakly labeled data for $3$ epochs on the train set, from publicly available checkpoints on huggingface (`bert-base-uncased`), with a batch size of 12. We use a learning rate of 5e-05 with an Adam optimizer and linear weight decay.
|
48 |
Additionally, we use 2000 warmup steps.
|
49 |
We fine-tune the 3 benchmark data for 8 epochs with 5 different random seeds, this version of the model is the only seed=4.
|
50 |
The batch size and the learning rate is the same as the pre-training setup, but the warm-up steps are reduced to 100.
|