RaTE-NER-Deberta / README.md

Update README.md

66d4b81 verified 3 months ago

5.24 kB

	---
	license: mit
	language:
	- en
	tags:
	- medical
	- radiology
	model-index:
	- name: rate-ner-rad
	results: []
	pipeline_tag: token-classification
	widget:
	- text: No suspicious focal mass lesion is seen in the left kidney.
	example_title: Example in radiopaedia
	---

	# RaTE-NER-Deberta

	This model is a fine-tuned version of [DeBERTa](https://huggingface.co/microsoft/deberta-v3-base) on the [RaTE-NER](https://huggingface.co/datasets/Angelakeke/RaTE-NER/) dataset.

	## Model description

	This model is trained to serve the RaTEScore metric, if you are interested in our pipeline, please refer to our [paper](https://aclanthology.org/2024.emnlp-main.836.pdf) and [Github](https://github.com/Angelakeke/RaTEScore).

	This model also can be used to extract Abnormality, Non-Abnormality, Anatomy, Disease, Non-Disease
	in medical radiology reports.

	## Usage

	<details>
	<summary> Click to expand the usage of this model. </summary>
	<pre><code>
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch
	def post_process(tokenized_text, predicted_entities, tokenizer):
	entity_spans = []
	start = end = None
	entity_type = None
	for i, (token, label) in enumerate(zip(tokenized_text, predicted_entities[:len(tokenized_text)])):
	if token in ["[CLS]", "[SEP]"]:
	continue
	if label != "O" and i < len(predicted_entities) - 1:
	if label.startswith("B-") and predicted_entities[i+1].startswith("I-"):
	start = i
	entity_type = label[2:]
	elif label.startswith("B-") and predicted_entities[i+1].startswith("B-"):
	start = i
	end = i
	entity_spans.append((start, end, label[2:]))
	start = i
	entity_type = label[2:]
	elif label.startswith("B-") and predicted_entities[i+1].startswith("O"):
	start = i
	end = i
	entity_spans.append((start, end, label[2:]))
	start = end = None
	entity_type = None
	elif label.startswith("I-") and predicted_entities[i+1].startswith("B-"):
	end = i
	if start is not None:
	entity_spans.append((start, end, entity_type))
	start = i
	entity_type = label[2:]
	elif label.startswith("I-") and predicted_entities[i+1].startswith("O"):
	end = i
	if start is not None:
	entity_spans.append((start, end, entity_type))
	start = end = None
	entity_type = None
	if start is not None and end is None:
	end = len(tokenized_text) - 2
	entity_spans.append((start, end, entity_type))
	save_pair = []
	for start, end, entity_type in entity_spans:
	entity_str = tokenizer.convert_tokens_to_string(tokenized_text[start:end+1])
	save_pair.append((entity_str, entity_type))
	return save_pair

	def run_ner(texts, idx2label, tokenizer, model, device):
	inputs = tokenizer(texts,
	max_length=512,
	padding=True,
	truncation=True,
	return_tensors="pt").to(device)
	with torch.no_grad():
	outputs = model(**inputs)
	predicted_labels = torch.argmax(outputs.logits, dim=2).tolist()
	save_pairs = []
	for i in range(len(texts)):
	predicted_entities = [idx2label[label] for label in predicted_labels[i]]
	non_pad_mask = inputs["input_ids"][i] != tokenizer.pad_token_id
	non_pad_length = non_pad_mask.sum().item()
	non_pad_input_ids = inputs["input_ids"][i][:non_pad_length]
	tokenized_text = tokenizer.convert_ids_to_tokens(non_pad_input_ids)
	save_pair = post_process(tokenized_text, predicted_entities, tokenizer)
	if i == 0:
	save_pairs = save_pair
	else:
	save_pairs.extend(save_pair)
	return save_pairs

	ner_labels = ['B-ABNORMALITY', 'I-ABNORMALITY',
	'B-NON-ABNORMALITY', 'I-NON-ABNORMALITY',
	'B-DISEASE', 'I-DISEASE',
	'B-NON-DISEASE', 'I-NON-DISEASE',
	'B-ANATOMY', 'I-ANATOMY',
	'O']
	idx2label = {i: label for i, label in enumerate(ner_labels)}

	tokenizer = AutoTokenizer.from_pretrained('Angelakeke/RaTE-NER-Deberta')
	model = AutoModelForTokenClassification.from_pretrained('Angelakeke/RaTE-NER-Deberta')

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)
	model.eval()

	We recommend to inference by sentences.

	text = ""

	texts = text.split('. ')
	save_pair = run_ner(texts, idx2label, tokenizer, model, device)

	</code></pre>

	</details>


	## Author

	Author: [Weike Zhao](https://angelakeke.github.io/)

	If you have any questions, please feel free to contact zwk0629@sjtu.edu.cn.

	## Citation
	```bibtex
	@inproceedings{zhao2024ratescore,
	title={RaTEScore: A Metric for Radiology Report Generation},
	author={Zhao, Weike and Wu, Chaoyi and Zhang, Xiaoman and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
	booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
	pages={15004--15019},
	year={2024}
	}
	```