File size: 1,504 Bytes
805578d
 
9873494
3c592ef
805578d
3c592ef
805578d
 
 
3c3ce4a
3756a49
3c3ce4a
 
 
 
3da2e1f
3c3ce4a
 
3da2e1f
a23785a
3c3ce4a
 
 
 
 
a23785a
 
422ee62
a23785a
408c74f
63245ee
 
3c3ce4a
63245ee
3c3ce4a
a23785a
 
 
3c3ce4a
422ee62
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
This model was pretrained on the bookcorpus dataset using knowledge distillation.

The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 384 (half the hidden size of BERT) and 6 attention heads (hence the same head size of BERT).

The knowledge distillation was performed using multiple loss functions.

The weights of the model were initialized from scratch.

PS : the tokenizer is the same as the one of the model bert-base-uncased.



To load the model \& tokenizer :

````python
from transformers import AutoModelForMaskedLM, BertTokenizer

model_name = "eli4s/Bert-L12-h384-A6"
model = AutoModelForMaskedLM.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)
````

To use it on a sentence :

````python
import torch

sentence = "Let's have a [MASK]."

model.eval()
inputs = tokenizer([sentence], padding='longest', return_tensors='pt')
output = model(inputs['input_ids'], attention_mask=inputs['attention_mask'])

mask_index = inputs['input_ids'].tolist()[0].index(103)
masked_token = output['logits'][0][mask_index].argmax(axis=-1)
predicted_token = tokenizer.decode(masked_token)

print(predicted_token)
````

Or we can also predict the n most relevant predictions :

````python
top_n = 5

vocab_size = model.config.vocab_size
logits = output['logits'][0][mask_index].tolist()
top_tokens = sorted(list(range(vocab_size)), key=lambda  i:logits[i], reverse=True)[:top_n]

tokenizer.decode(top_tokens)
````