=====CLIP-ViT-L-14-448px-MedICaT-ROCO=====
Pretrained Biomed CLIP model with higher resolution. Suitable for many medical downstream tasks.
Dataset: MedICaT-200k, ROCO-80k
Base model: [https://huggingface.co/ryanyip7777/pmc_vit_l_14]
Training config:
img-size: 448
lr: 1.024e-6
epoch: 6
batchsize: 16
Benchmark: ROCO-validation-8785samples
model | clip_val_loss | image_to_text_mean_rank | image_to_text_R@10 | text_to_image_mean_rank | text_to_image_R@10 |
---|---|---|---|---|---|
pmc_vit_l_14 | 0.6886 | 41.4641 | 0.6263 | 54.4236 | 0.6410 |
CLIP-ViT-L-14-448px-MedICaT-ROCO | 0.3266 | 34.4018 | 0.6748 | 42.0458 | 0.6791 |
We use code base from open_clip[https://github.com/mlfoundations/open_clip]
Add personal configs in path ./open_clip-main/src/open_clip/model_configs to load this model
import torch
from PIL import Image
import open_clip
model, _ , preprocess = open_clip.create_model_and_transforms('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO')
tokenizer = open_clip.get_tokenizer('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO')
image = preprocess(Image.open("xray.png")).unsqueeze(0)
text = tokenizer(["xray", "CT", "MRI"])
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs)
- Downloads last month
- 10