mosaicml
/

mosaic-bert-base

Model card Files Files and versions Community

jacobfulano commited on Jun 14, 2023

Commit

ce11e47

•

1 Parent(s): fdbb682

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -145,7 +145,7 @@ The learning rate schedule begins with a warmup to a maximum learning rate of 5.
 Warmup lasted for 6% of the full training duration. Global batch size was set to 4096, and microbatch size was 128; since global batch size was 4096, full pretraining consisted of 70,000 batches.
 We set the maximum sequence length during pretraining to 128, and we used the standard embedding dimension of 768.
 For MosaicBERT, we applied 0.1 dropout to the feedforward layers but no dropout to the FlashAttention module, as this was not possible with the OpenAI triton implementation.
-Full configuration details for pretraining MosaicBERT-Base can be found in the configuration yamls [in the mosaicml/examples repo here](https://github.com/mosaicml/examples/tree/main/bert/yamls/main).
 ## Evaluation results

 Warmup lasted for 6% of the full training duration. Global batch size was set to 4096, and microbatch size was 128; since global batch size was 4096, full pretraining consisted of 70,000 batches.
 We set the maximum sequence length during pretraining to 128, and we used the standard embedding dimension of 768.
 For MosaicBERT, we applied 0.1 dropout to the feedforward layers but no dropout to the FlashAttention module, as this was not possible with the OpenAI triton implementation.
+Full configuration details for pretraining MosaicBERT-Base can be found in the configuration yamls [in the mosaicml/examples repo here](https://github.com/mosaicml/examples/blob/main/examples/benchmarks/bert/yamls/main/mosaic-bert-base-uncased.yaml).
 ## Evaluation results