--- license: other library_name: peft tags: - generated_from_trainer base_model: Salesforce/instructblip-vicuna-7b datasets: - pantheon-prompts-dataset model-index: - name: instructblip-vicuna-7b-peft-lora results: [] --- # instructblip-vicuna-7b-peft-lora This model is a fine-tuned version of [Salesforce/instructblip-vicuna-7b](https://huggingface.co/Salesforce/instructblip-vicuna-7b) on the [pantheon-prompts-dataset](https://huggingface.co/datasets/Idor980/pantheon-prompts-dataset) dataset. It achieves the following results on the evaluation set: - Loss: 5.3583 # Model Description ## Project Overview This model is part of a two-phase project aimed at automatic prompt engineering for text-to-image generation. ## Current Phase: Supervised Fine-Tuning - **Status**: Completed - **Input**: Base prompt and an image - **Output**: Enhanced prompt for image generation - **Purpose**: Adapt the base model to generate improved prompts ## Future Phase: Reinforcement Learning Fine-Tuning - **Status**: Planned - **Method**: Proximal Policy Optimization (PPO) - **Purpose**: Further refine prompt quality ## Ultimate Objective 1. Accept a base prompt and a preferred generated image as input 2. Automatically engineer an enhanced prompt 3. Use the enhanced prompt to generate higher-quality images with the same text-to-image model ## Checkpoint Information This model checkpoint represents the completion of the Supervised Fine-Tuning phase (Phase 1) in the overall project. ## Training Limitations - **Dataset Size**: The model was trained on a limited dataset of 1,600 examples. - **Resource Constraints**: Due to computational resource limitations, we were unable to use a larger training set. - **Potential Issues**: - The model may not have fully generalized to a wide range of inputs. - There is a risk of overfitting to the training data. - **Caution**: Users should be aware that the model's performance might be inconsistent on inputs that significantly differ from the training set. ## Training and evaluation data [pantheon-prompts-dataset](https://huggingface.co/datasets/Idor980/pantheon-prompts-dataset) ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine_with_restarts - lr_scheduler_warmup_ratio: 0.1 - training_steps: 1000 ## How to use: ```python from transformers import ( BitsAndBytesConfig, InstructBlipProcessor, InstructBlipForConditionalGeneration, ) # Define the quantization config bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, ) processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-vicuna-7b", legacy=False) processor.padding_side = "right" processor.tokenizer.padding_side = "right" model = InstructBlipForConditionalGeneration.from_pretrained( "Salesforce/instructblip-vicuna-7b", quantization_config=bnb_config, device_map="auto" ) model = PeftModelForCausalLM.from_pretrained( model, "NoyHanan/instructblip-vicuna-7b-peft-lora", is_trainable=False, adapter_name="lora_policy", ) prompt = "" image = "" inputs = self.base_processor(texts=prompt, images=[image]).to("cuda") res = model.generate( **inputs, do_sample=True, pad_token_id=processor.tokenizer.pad_token_id, top_p=1.0, top_k=0, temperature=0.5, ) enhanced_prompt = processor.decode(res, skip_special_tokens=True) ``` ### Framework versions - PEFT 0.11.1 - Transformers 4.41.2 - Pytorch 2.3.1+cu121 - Datasets 2.19.2 - Tokenizers 0.19.1