YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Combined Multimodal Model

This model performs medical image classification and report generation using a custom architecture that combines a video model and a text generation model.

Model Details

  • Architecture: Custom model combining a 3D ResNet (r3d_18) and BioBART.
  • Tasks:
    • Classification: Classifies medical images into one of four classes: acute, normal, chronic, or lacunar.
    • Report Generation: Generates medical reports based on the input images.

Usage

import torch
from transformers import AutoTokenizer
from model import CombinedModel, ImageToTextProjector
from torchvision import models

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("YOUR_HF_USERNAME/combined-multimodal-model")

# Initialize models
video_model = models.video.r3d_18(pretrained=True)
video_model.fc = torch.nn.Linear(video_model.fc.in_features, 512)

report_generator = AutoModelForSeq2SeqLM.from_pretrained("GanjinZero/biobart-v2-base")

projector = ImageToTextProjector(512, report_generator.config.d_model)

num_classes = 4
combined_model = CombinedModel(video_model, report_generator, num_classes, projector)

# Load state dict
state_dict = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/YOUR_HF_USERNAME/combined-multimodal-model/resolve/main/pytorch_model.bin",
    map_location=torch.device('cpu')
)
combined_model.load_state_dict(state_dict)
combined_model.eval()

# Now you can use combined_model for inference
Downloads last month
14
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Space using KYAGABA/combined-multimodal-model 1