Plant_Disease_SWIN_BLIP2_Llama3.2_3B

This is a fine-tuned BLIP-2 model that integrates a Swin Transformer for vision and Llama 3.2 (3B) for language generation. It is optimized for plant disease-related visual question answering (VQA) tasks.

Model Overview

Vision Backbone:
Swin Transformer (microsoft/swin-tiny-patch4-window7-224)
Language Model:
Llama 3.2-3B
Architecture:
Built on the BLIP-2 framework, this model was fine-tuned using the training code available in the Custom-BLIP-2 GitHub repository.

Inference

You can easily perform inference with this model using the HuggingFace transformers library.

Example Inference Code

from transformers import Blip2Processor, Blip2ForConditionalGeneration, SwinModel
from PIL import Image


class CustomBlip2ForConditionalGeneration(Blip2ForConditionalGeneration):
    def __init__(self, config):
        super().__init__(config)
        self.vision_model = SwinModel(config.vision_config)

# Load the processor and model
processor = Blip2Processor.from_pretrained("raghavendrad60/Plant_Disease_SWIN_BLIP2_Llama3.2_3B")
model = CustomBlip2ForConditionalGeneration.from_pretrained("raghavendrad60/Plant_Disease_SWIN_BLIP2_Llama3.2_3B")

# Prepare an image and text input (e.g., a plant image and a relevant question)
image = Image.open("path_to_your_image.jpg")
text = "Q) Name plant and disease."

# Process the inputs
inputs = processor(image, text, return_tensors="pt", padding="max_length", max_length=512, truncation=True)

# Generate output
outputs = model(**inputs)
answer = processor.tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Answer:", answer)

Additional Information

Training Code:
The model was trained using the code available in the Custom-BLIP-2 GitHub repository.
Usage:
This model is designed for research purposes and can be used for plant disease detection and related VQA tasks. It leverages a robust vision encoder and language model to generate high-quality responses.

Links

Feel free to experiment with the model and share your feedback!