microsoft
/

BiomedNLP-KRISSBERT-PubMed-UMLS-EL

@@ -18,7 +18,7 @@ See [Zhang et al., 2021](https://arxiv.org/abs/2112.07887) for the details.
 Note that some prior systems like [BioSyn](https://aclanthology.org/2020.acl-main.335.pdf), [SapBERT](https://aclanthology.org/2021.naacl-main.334.pdf), and their follow-up work (e.g., [Lai et al., 2021](https://aclanthology.org/2021.findings-emnlp.140.pdf)) claimed to do entity linking, but their systems completely ignore the context of an entity mention, and can only predict a surface form in the entity dictionary (See Figure 1 in [BioSyn](https://aclanthology.org/2020.acl-main.335.pdf)), _**not the canonical entity ID (e.g., CUI in UMLS)**_. Therefore, they can't disambiguate ambiguous mentions. For instance, given the entity mention "_ER_" in the sentence "*ER crowding has become a wide-spread problem*", their systems ignore the sentence context, and simply predict the closest surface form, which is just "ER". Multiple entities share this surface form as a potential name or alias, such as *Emergency Room (C0562508)*, *Estrogen Receptor Gene (C1414461)*, and *Endoplasmic Reticulum(C0014239)*. Without using the context information, their systems can't resolve such ambiguity and pinpoint the correct entity *Emergency Room (C0562508)*. More problematically, their evaluation would deem such an ambiguous prediction as correct. Consequently, the reported results in their papers do not reflect true performance on entity linking.
-## Usage of KRISSBERT for Entity Linking
 Here, we use the [MedMentions](https://github.com/chanzuckerberg/MedMentions) data to show you how to 1) **generate prototype embeddings**, and 2) **run entity linking**.

 Note that some prior systems like [BioSyn](https://aclanthology.org/2020.acl-main.335.pdf), [SapBERT](https://aclanthology.org/2021.naacl-main.334.pdf), and their follow-up work (e.g., [Lai et al., 2021](https://aclanthology.org/2021.findings-emnlp.140.pdf)) claimed to do entity linking, but their systems completely ignore the context of an entity mention, and can only predict a surface form in the entity dictionary (See Figure 1 in [BioSyn](https://aclanthology.org/2020.acl-main.335.pdf)), _**not the canonical entity ID (e.g., CUI in UMLS)**_. Therefore, they can't disambiguate ambiguous mentions. For instance, given the entity mention "_ER_" in the sentence "*ER crowding has become a wide-spread problem*", their systems ignore the sentence context, and simply predict the closest surface form, which is just "ER". Multiple entities share this surface form as a potential name or alias, such as *Emergency Room (C0562508)*, *Estrogen Receptor Gene (C1414461)*, and *Endoplasmic Reticulum(C0014239)*. Without using the context information, their systems can't resolve such ambiguity and pinpoint the correct entity *Emergency Room (C0562508)*. More problematically, their evaluation would deem such an ambiguous prediction as correct. Consequently, the reported results in their papers do not reflect true performance on entity linking.
+## Usage for Entity Linking
 Here, we use the [MedMentions](https://github.com/chanzuckerberg/MedMentions) data to show you how to 1) **generate prototype embeddings**, and 2) **run entity linking**.