eli4s commited on
Commit
805578d
1 Parent(s): e819f07

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ This model was pretrained on the bookcorpus dataset using knowledge distillation.
2
+
3
+ The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 384 (half the hidden size of BERT) and 6 attention heads.
4
+ The knowledge distillation was performed using multiple loss functions.
5
+ The weights of the model were initialized from scratch.
6
+
7
+ PS : the tokenizer is the same as the one of the model bert-base-uncased.