Plant_Disease_SWIN_BLIP2_Llama3.2_3B
This is a fine-tuned BLIP-2 model that integrates a Swin Transformer for vision and Llama 3.2 (3B) for language generation. It is optimized for plant disease-related visual question answering (VQA) tasks.
Model Overview
Vision Backbone:
Swin Transformer (microsoft/swin-tiny-patch4-window7-224
)Language Model:
Llama 3.2-3BArchitecture:
Built on the BLIP-2 framework, this model was fine-tuned using the training code available in the Custom-BLIP-2 GitHub repository.
Inference
You can easily perform inference with this model using the HuggingFace transformers
library.
Example Inference Code
from transformers import Blip2Processor, Blip2ForConditionalGeneration, SwinModel
from PIL import Image
class CustomBlip2ForConditionalGeneration(Blip2ForConditionalGeneration):
def __init__(self, config):
super().__init__(config)
self.vision_model = SwinModel(config.vision_config)
# Load the processor and model
processor = Blip2Processor.from_pretrained("raghavendrad60/Plant_Disease_SWIN_BLIP2_Llama3.2_3B")
model = CustomBlip2ForConditionalGeneration.from_pretrained("raghavendrad60/Plant_Disease_SWIN_BLIP2_Llama3.2_3B")
# Prepare an image and text input (e.g., a plant image and a relevant question)
image = Image.open("path_to_your_image.jpg")
text = "Q) Name plant and disease."
# Process the inputs
inputs = processor(image, text, return_tensors="pt", padding="max_length", max_length=512, truncation=True)
# Generate output
outputs = model(**inputs)
answer = processor.tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Answer:", answer)
Additional Information
Training Code:
The model was trained using the code available in the Custom-BLIP-2 GitHub repository.Usage:
This model is designed for research purposes and can be used for plant disease detection and related VQA tasks. It leverages a robust vision encoder and language model to generate high-quality responses.
Links
Feel free to experiment with the model and share your feedback!
- Downloads last month
- 9