instructblip-vicuna-7b-peft-lora

This model is a fine-tuned version of Salesforce/instructblip-vicuna-7b on the pantheon-prompts-dataset dataset. It achieves the following results on the evaluation set:

Loss: 5.3583

Model Description

Project Overview

This model is part of a two-phase project aimed at automatic prompt engineering for text-to-image generation.

Current Phase: Supervised Fine-Tuning

Status: Completed
Input: Base prompt and an image
Output: Enhanced prompt for image generation
Purpose: Adapt the base model to generate improved prompts

Future Phase: Reinforcement Learning Fine-Tuning

Status: Planned
Method: Proximal Policy Optimization (PPO)
Purpose: Further refine prompt quality

Ultimate Objective

Accept a base prompt and a preferred generated image as input
Automatically engineer an enhanced prompt
Use the enhanced prompt to generate higher-quality images with the same text-to-image model

Checkpoint Information

This model checkpoint represents the completion of the Supervised Fine-Tuning phase (Phase 1) in the overall project.

Training Limitations

Dataset Size: The model was trained on a limited dataset of 1,600 examples.
Resource Constraints: Due to computational resource limitations, we were unable to use a larger training set.
Potential Issues:
- The model may not have fully generalized to a wide range of inputs.
- There is a risk of overfitting to the training data.
Caution: Users should be aware that the model's performance might be inconsistent on inputs that significantly differ from the training set.

Training and evaluation data

pantheon-prompts-dataset

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_ratio: 0.1
training_steps: 1000

How to use:

from transformers import (
    BitsAndBytesConfig,
    InstructBlipProcessor,
    InstructBlipForConditionalGeneration,
)

# Define the quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-vicuna-7b", legacy=False)
processor.padding_side = "right"
processor.tokenizer.padding_side = "right"

model = InstructBlipForConditionalGeneration.from_pretrained(
        "Salesforce/instructblip-vicuna-7b", quantization_config=bnb_config, device_map="auto"
)

model = PeftModelForCausalLM.from_pretrained(
    model,
    "NoyHanan/instructblip-vicuna-7b-peft-lora",
    is_trainable=False,
    adapter_name="lora_policy",
)

prompt = "<Base_Prompt>"
image = "<Image>"

inputs = self.base_processor(texts=prompt, images=[image]).to("cuda")

res = model.generate(
    **inputs,
    do_sample=True,
    pad_token_id=processor.tokenizer.pad_token_id,
    top_p=1.0,
    top_k=0,
    temperature=0.5,
)

enhanced_prompt = processor.decode(res, skip_special_tokens=True)

Framework versions

PEFT 0.11.1
Transformers 4.41.2
Pytorch 2.3.1+cu121
Datasets 2.19.2
Tokenizers 0.19.1

NoyHanan
/

instructblip-vicuna-7b-peft-lora

You need to agree to share your contact information to access this model