Update README.md
Browse files
README.md
CHANGED
@@ -33,6 +33,66 @@ Trained on wikipedia datasets:
|
|
33 |
Trained MLM and NSP pre-training scheme from [Large Batch Optimization for Deep Learning: Training BERT in 76 minutes](https://arxiv.org/abs/1904.00962).
|
34 |
Trained on 16 Graphcore Mk2 IPUs.
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
### Training hyperparameters
|
37 |
|
38 |
The following hyperparameters were used during phase 1 training:
|
|
|
33 |
Trained MLM and NSP pre-training scheme from [Large Batch Optimization for Deep Learning: Training BERT in 76 minutes](https://arxiv.org/abs/1904.00962).
|
34 |
Trained on 16 Graphcore Mk2 IPUs.
|
35 |
|
36 |
+
Command lines:
|
37 |
+
|
38 |
+
Phase 1:
|
39 |
+
```
|
40 |
+
python examples/language-modeling/run_pretraining.py \
|
41 |
+
--config_name bert-base-uncased \
|
42 |
+
--tokenizer_name bert-base-uncased \
|
43 |
+
--do_train \
|
44 |
+
--logging_steps 5 \
|
45 |
+
--max_seq_length 128 \
|
46 |
+
--ipu_config_name Graphcore/bert-base-ipu \
|
47 |
+
--dataset_name Graphcore/wikipedia-bert-128 \
|
48 |
+
--max_steps 10500 \
|
49 |
+
--is_already_preprocessed \
|
50 |
+
--dataloader_num_workers 64 \
|
51 |
+
--dataloader_mode async_rebatched \
|
52 |
+
--lamb \
|
53 |
+
--lamb_no_bias_correction \
|
54 |
+
--per_device_train_batch_size 32 \
|
55 |
+
--gradient_accumulation_steps 512 \
|
56 |
+
--learning_rate 0.006 \
|
57 |
+
--lr_scheduler_type linear \
|
58 |
+
--loss_scaling 16384 \
|
59 |
+
--weight_decay 0.01 \
|
60 |
+
--warmup_ratio 0.28 \
|
61 |
+
--save_steps 100 \
|
62 |
+
--config_overrides "layer_norm_eps=0.001" \
|
63 |
+
--ipu_config_overrides "device_iterations=1" \
|
64 |
+
--output_dir output-pretrain-bert-base-phase1
|
65 |
+
```
|
66 |
+
|
67 |
+
Phase 2:
|
68 |
+
```
|
69 |
+
python examples/language-modeling/run_pretraining.py \
|
70 |
+
--config_name bert-base-uncased \
|
71 |
+
--tokenizer_name bert-base-uncased \
|
72 |
+
--model_name_or_path ./output-pretrain-bert-base-phase1 \
|
73 |
+
--do_train \
|
74 |
+
--logging_steps 5 \
|
75 |
+
--max_seq_length 512 \
|
76 |
+
--ipu_config_name Graphcore/bert-base-ipu \
|
77 |
+
--dataset_name Graphcore/wikipedia-bert-512 \
|
78 |
+
--max_steps 2038 \
|
79 |
+
--is_already_preprocessed \
|
80 |
+
--dataloader_num_workers 128 \
|
81 |
+
--dataloader_mode async_rebatched \
|
82 |
+
--lamb \
|
83 |
+
--lamb_no_bias_correction \
|
84 |
+
--per_device_train_batch_size 8 \
|
85 |
+
--gradient_accumulation_steps 512 \
|
86 |
+
--learning_rate 0.002828 \
|
87 |
+
--lr_scheduler_type linear \
|
88 |
+
--loss_scaling 128.0 \
|
89 |
+
--weight_decay 0.01 \
|
90 |
+
--warmup_ratio 0.128 \
|
91 |
+
--config_overrides "layer_norm_eps=0.001" \
|
92 |
+
--ipu_config_overrides "device_iterations=1,embedding_serialization_factor=2,matmul_proportion=0.22" \
|
93 |
+
--output_dir output-pretrain-bert-base-phase2
|
94 |
+
```
|
95 |
+
|
96 |
### Training hyperparameters
|
97 |
|
98 |
The following hyperparameters were used during phase 1 training:
|