mstaron
/

SingBERTa

Inference Endpoints

Model card Files Files and versions Community

SingBERTa / README.md

mstaron's picture

Update README.md

c389e88 almost 2 years ago

|

history blame contribute delete

1.63 kB

	---
	license: cc-by-4.0
	---

	This model is a RoBERTa model trained on a programming language code - WolfSSL + examples of Singletons diffused with the Linux Kernel code. The model is pre-trained to understand the concep of a singleton in the code

	The programming language is C/C++, but the actual inference can also use other languages.

	Using the model to unmask can be done in the following way

	```python
	from transformers import pipeline
	unmasker = pipeline('fill-mask', model='mstaron/SingBERTa')
	unmasker("Hello I'm a <mask> model.")
	```

	To obtain the embeddings for downstream task can be done in the following way:

	```python
	# import the model via the huggingface library
	from transformers import AutoTokenizer, AutoModelForMaskedLM

	# load the tokenizer and the model for the pretrained SingBERTa
	tokenizer = AutoTokenizer.from_pretrained('mstaron/SingBERTa')

	# load the model
	model = AutoModelForMaskedLM.from_pretrained("mstaron/SingBERTa")

	# import the feature extraction pipeline
	from transformers import pipeline

	# create the pipeline, which will extract the embedding vectors
	# the models are already pre-defined, so we do not need to train anything here
	features = pipeline(
	"feature-extraction",
	model=model,
	tokenizer=tokenizer,
	return_tensor = False
	)

	# extract the features == embeddings
	lstFeatures = features('Class SingletonX1')

	# print the first token's embedding [CLS]
	# which is also a good approximation of the whole sentence embedding
	# the same as using np.mean(lstFeatures[0], axis=0)
	lstFeatures[0][0]
	```

	In order to use the model, we need to train it on the downstream task.