Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

roberta-python


language: python datasets: - code_search_net

This is a roberta pre-trained version on the CodeSearchNet dataset for Python Mask Language Model mission.

To load the model: (necessary packages: !pip install transformers sentencepiece)

from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
tokenizer = AutoTokenizer.from_pretrained("dbernsohn/roberta-python")
model = AutoModelWithLMHead.from_pretrained("dbernsohn/roberta-python")

fill_mask = pipeline(
    "fill-mask",
    model=model,
    tokenizer=tokenizer
)

You can then use this model to fill masked words in a Python code.

code = """
new_dict = {}
for k, v in my_dict.<mask>():
    new_dict[k] = v**2
""".lstrip()

pred = {x["token_str"].replace("Ġ", ""): x["score"] for x in fill_mask(code)}
sorted(pred.items(), key=lambda kv: kv[1], reverse=True)
# [('items', 0.7376779913902283),
# ('keys', 0.16238391399383545),
# ('values', 0.03965481370687485),
# ('iteritems', 0.03346433863043785),
# ('splitlines', 0.0032723243348300457)]

The whole training process and hyperparameters are in my GitHub repo

Created by Dor Bernsohn

Downloads last month
17
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.