permediq
/

SapBERT-DE

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

femustafa commited on Apr 24

Commit

c51606b

•

1 Parent(s): f172dab

usage added

Files changed (1) hide show

README.md +42 -1

README.md CHANGED Viewed

@@ -8,4 +8,45 @@ tags:
 - umls
 ---
-SapBERT-DE is a model for German biomedical entity linking which is obtained by fine-tuning multilingual entity linking model [`cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR`](https://huggingface.co/cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR) using a German biomedical entity linking knowledge base named [UMLS-Wikidata](https://zenodo.org/records/11003203).

 - umls
 ---
+SapBERT-DE is a model for German biomedical entity linking which is obtained by fine-tuning multilingual entity linking model [`cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR`](https://huggingface.co/cambridgeltl/SapBERT-UMLS-2020AB-all-lang-from-XLMR) using a German biomedical entity linking knowledge base named [UMLS-Wikidata](https://zenodo.org/records/11003203).
+# Usage
+```python
+import numpy as np
+from tqdm import tqdm
+import torch
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("permediq/SapBERT-DE", use_fast=True)
+model = AutoModel.from_pretrained("permediq/SapBERT-DE").cuda()
+# entity descriptions to embed
+entity_descriptions = ["Cerebellum", "Zerebellum", "Kleinhirn", "Anaesthesie"]
+bs = 32 # batch size
+all_embs = []
+for i in tqdm(np.arange(0, len(entity_descriptions), bs)):
+    toks = tokenizer.batch_encode_plus(entity_descriptions[i:i+bs],
+                                       padding="max_length",
+                                       max_length=40, # model trained with 40 max_length
+                                       truncation=True,
+                                       return_tensors="pt")
+    toks_cuda = {}
+    for k,v in toks.items():
+        toks_cuda[k] = v.cuda()
+    cls_rep = model(**toks_cuda)[0][:,0,:]
+    all_embs.append(cls_rep.cpu().detach())
+all_embs = torch.cat(all_embs)
+def cos_sim(a, b):
+    a_norm = torch.nn.functional.normalize(a, p=2, dim=1)
+    b_norm = torch.nn.functional.normalize(b, p=2, dim=1)
+    return torch.mm(a_norm, b_norm.transpose(0, 1))
+# cosine similarity of first entity with all the entities
+print(cos_sim(all_embs[0].unsqueeze(0), all_embs))
+# >>> tensor([[1.0000, 0.9337, 0.6206, 0.2086]])
+```