File size: 1,829 Bytes
3423c66 08f790b 3423c66 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
This is a masked language model that was trained on IMDB dataset using a finetuned DistilBERT model.
### Note:
I published a tutorial explaining how transformers work and how to train a masked language model using transformer. https://olafenwaayoola.medium.com/the-concept-of-transformers-and-training-a-transformers-model-45a09ae7fb50
# Rest API Code for Testing the Masked Language Model
Inference API python code for testing the masked language model.
``` python
import requests
API_URL = "https://api-inference.huggingface.co/models/ayoolaolafenwa/Masked-Language-Model"
headers = {"Authorization": "Bearer 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"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Washington DC is the [MASK] of USA.",
})
print(output[0]["sequence"])
```
Output
```
washington dc is the capital of usa.
```
It produces the correct output, *washington dc is the capital of usa.*
## Load the Masked Language Model with Transformers
You can easily load the Language model with transformers using this code.
``` python
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch
tokenizer = AutoTokenizer.from_pretrained("ayoolaolafenwa/Masked-Language-Model")
model = AutoModelForMaskedLM.from_pretrained("ayoolaolafenwa/Masked-Language-Model")
inputs = tokenizer("The internet [MASK] amazing.", return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
# retrieve index of [MASK]
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]
predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
output = tokenizer.decode(predicted_token_id)
print(output)
```
Output
```
is
```
It prints out the predicted masked word *is*. |