lighteternal
/

BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli

Text Classification

textual-entailment

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli / README.md

lighteternal's picture

Update README.md

fdb646e about 3 years ago

|

history blame contribute delete

2.25 kB

	---
	language: en
	tags:
	- textual-entailment
	- nli
	- pytorch
	datasets:
	- mnli
	license: mit
	widget :
	- text: "EpCAM is overexpressed in breast cancer. </s></s> EpCAM is downregulated in breast cancer."
	---

	# BiomedNLP-PubMedBERT finetuned on textual entailment (NLI)

	The [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext?text=%5BMASK%5D+is+a+tumor+suppressor+gene) finetuned on the MNLI dataset. It should be useful in textual entailment tasks involving biomedical corpora.

	## Usage
	Given two sentences (a premise and a hypothesis), the model outputs the logits of entailment, neutral or contradiction.

	You can test the model using the HuggingFace model widget on the side:
	- Input two sentences (premise and hypothesis) one after the other.
	- The model returns the probabilities of 3 labels: entailment(LABEL:0), neutral(LABEL:1) and contradiction(LABEL:2) respectively.

	To use the model locally on your machine:
	```python
	# import torch
	# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli")
	model = AutoModelForSequenceClassification.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli")

	premise = 'EpCAM is overexpressed in breast cancer'
	hypothesis = 'EpCAM is downregulated in breast cancer.'

	# run through model pre-trained on MNLI
	x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
	truncation_strategy='only_first')
	logits = model(x)[0]

	probs = logits.softmax(dim=1)
	print('Probabilities for entailment, neutral, contradiction \n', np.around(probs.cpu().
	detach().numpy(),3))
	# Probabilities for entailment, neutral, contradiction
	# 0.001 0.001 0.998

	```

	## Metrics
	Evaluation on classification accuracy (entailment, contradiction, neutral) on MNLI test set:

	\| Metric \| Value \|
	\| --- \| --- \|
	\| Accuracy \| 0.8338\|

	See Training Metrics tab for detailed info.