--- base_model: - cointegrated/rubert-tiny2 datasets: - Mykes/patient_queries_ner_SDDCS language: - ru library_name: transformers tags: - biology - medical --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63565a3d58acee56a457f799/GiHEZBESael_bPiVtzmVD.jpeg) # rubert_ner_SDDCS SDDCS - abbreviation for ner-entities SYMPTOMS, DISEASES, DRUGS, CITIES, SUBWAY STATIONS (additionall it is able to predict GENDER and AGE entities) This is a fine-tuned Named Entity Recognition (NER) model based on the [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2) model with only 29.4M params, designed to detect russian medical entities like diseases, drugs, symptoms, and more. # rubert_ner_SDDCS Модель med_ner_SDDCS для извлечения именнованных сущностей из запросов пациентов. Аббревиатура SDDCS указывает на список сущностей (S - симптомы, D - заболевания, D - препараты, C - город, S - станция метро. Также, модель выделяет GENDER - указание на пол и AGE - указание на возраст). Модель основана на компактной rubert-tiny2 модели с 29.4 миллиона параметров, что оптимально для запуска на сервере с небольшими требованиями к железу. # Model Details - Model Name: rubert_ner_SDDCS - Base Model: cointegrated/rubert-tiny2 - Fine-tuned on: [Mykes/patient_queries_ner_SDDCS](https://huggingface.co/datasets/Mykes/patient_queries_ner_SDDCS) ## Entities Recognized: - GENDER (e.g., женщина, мужчина) 👩👨 - DISEASE (e.g., паническое расстройство, грипп, ...) 🤒 - SYMPTOM (e.g., тревога, одышка, ...) 🩺 - SPECIALITY (e.g., невролог, кардиолог, ...) 👩‍⚕️ - CITY (e.g., Тула, Москва, Иркутск, ...) 🏙️ - SUBWAY (e.g., Шоссе Энтузиастов, Проспект Мира, ...) 🚇 - DRUG (e.g., кардиомагнил, ципралекс) 💊 - AGE (e.g., ребенок, пожилой) 🧒🏼👴 ## Model Performance The fine-tuned model has achieved the following performance metrics: ``` precision recall f1-score support AGE 1.00 1.00 1.00 583 CITY 1.00 1.00 1.00 5244 DISEASE 0.99 1.00 1.00 6569 DRUG 1.00 1.00 1.00 8220 GENDER 1.00 1.00 1.00 664 SPECIALITY 1.00 0.98 0.99 4207 SUBWAY 1.00 1.00 1.00 1084 SYMPTOM 1.00 1.00 1.00 8979 micro avg 1.00 1.00 1.00 35550 macro avg 1.00 1.00 1.00 35550 weighted avg 1.00 1.00 1.00 35550 ``` ## When to use You can use this model with the huggingface transformers 🤗 to perform Named Entity Recognition (NER) tasks in the russian medical domain, mainly for patient queries. Here's how to load and use the model: ## Load the tokenizer and model with transformers ``` from transformers import pipeline pipe = pipeline(task="ner", model='Mykes/rubert_ner_SDDCS', tokenizer='Mykes/rubert_ner_SDDCS', aggregation_strategy="max") # I made the misspelled words on purpose query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство. Подскажи психиатра в районе метро Октбрьской." pipe(query.lower()) ``` Result: ``` [{'entity_group': 'AGE', 'score': 0.99993, 'word': 'ребенка', 'start': 2, 'end': 9}, {'entity_group': 'SYMPTOM', 'score': 0.9885457, 'word': 'треога', 'start': 10, 'end': 16}, {'entity_group': 'SYMPTOM', 'score': 0.9934536, 'word': 'норушения сна', 'start': 19, 'end': 32}, {'entity_group': 'SYMPTOM', 'score': 0.9999765, 'word': 'потеря сознания', 'start': 34, 'end': 49}, {'entity_group': 'DISEASE', 'score': 0.999972, 'word': 'паническое расстройство', 'start': 66, 'end': 89}, {'entity_group': 'SPECIALITY', 'score': 0.85958296, 'word': 'психиатра', 'start': 100, 'end': 109}, {'entity_group': 'SUBWAY', 'score': 0.9955049, 'word': 'октбрьской', 'start': 125, 'end': 135}] ``` ## How to render ``` import spacy from spacy import displacy def convert_to_displacy_format(text, ner_results): entities = [] for result in ner_results: # Convert the Hugging Face output into the format displacy expects entities.append({ "start": result['start'], "end": result['end'], "label": result['entity_group'] }) return { "text": text, "ents": entities, "title": None } query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство, принимал атаракс. Подскажи хорошего психиатра в районе метро Октбрьской." ner_results = pipe(query.lower()) displacy_data = convert_to_displacy_format(query, ner_results) colors = { "SPECIALITY": "linear-gradient(90deg, #aa9cfc, #fc9ce7)", "CITY": "linear-gradient(90deg, #feca57, #ff9f43)", "DRUG": "linear-gradient(90deg, #55efc4, #81ecec)", "DISEASE": "linear-gradient(90deg, #fab1a0, #ff7675)", "SUBWAY": "linear-gradient(90deg, #00add0, #0039a6)", "AGE": "linear-gradient(90deg, #f39c12, #e67e22)", "SYMPTOM": "linear-gradient(90deg, #e74c3c, #c0392b)" } options = {"ents": ["SPECIALITY", "CITY", "DRUG", "DISEASE", "SYMPTOM", "AGE", "SUBWAY"], "colors": colors} html = displacy.render(displacy_data, style="ent", manual=True, options=options, jupyter=False) with open("ner_visualization_with_colors.html", "w", encoding="utf-8") as f: f.write(html) from IPython.display import display, HTML display(HTML(html)) ```