eli4s
/

Bert-L12-h384-A6

Inference Endpoints

Model card Files Files and versions Community

eli4s commited on Jul 16, 2021

Commit

805578d

•

1 Parent(s): e819f07

Create README.md

Files changed (1) hide show

README.md +7 -0

README.md ADDED Viewed

	@@ -0,0 +1,7 @@

+This model was pretrained on the bookcorpus dataset using knowledge distillation.
+The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 384 (half the hidden size of BERT) and 6 attention heads.
+The knowledge distillation was performed using multiple loss functions.
+The weights of the model were initialized from scratch.
+PS : the tokenizer is the same as the one of the model bert-base-uncased.