=====CLIP-ViT-L-14-448px-MedICaT-ROCO=====

Pretrained Biomed CLIP model with higher resolution. Suitable for many medical downstream tasks.

Dataset: MedICaT-200k, ROCO-80k

Base model: [https://huggingface.co/ryanyip7777/pmc_vit_l_14]

Training config:
img-size: 448
lr: 1.024e-6
epoch: 6
batchsize: 16

Benchmark: ROCO-validation-8785samples

model	clip_val_loss	image_to_text_mean_rank	image_to_text_R@10	text_to_image_mean_rank	text_to_image_R@10
pmc_vit_l_14	0.6886	41.4641	0.6263	54.4236	0.6410
CLIP-ViT-L-14-448px-MedICaT-ROCO	0.3266	34.4018	0.6748	42.0458	0.6791

We use code base from open_clip[https://github.com/mlfoundations/open_clip]
Add personal configs in path ./open_clip-main/src/open_clip/model_configs to load this model

import torch
from PIL import Image
import open_clip

model, _ , preprocess = open_clip.create_model_and_transforms('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO')
tokenizer = open_clip.get_tokenizer('hf-hub:luhuitong/CLIP-ViT-L-14-448px-MedICaT-ROCO')

image = preprocess(Image.open("xray.png")).unsqueeze(0)
text = tokenizer(["xray", "CT", "MRI"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)