OmniGen / docs /fine-tuning.md
yrr
update inference code
a713a09

A newer version of the Gradio SDK is available: 5.4.0

Upgrade

Fine-tuning OmniGen

Fine-tuning Omnigen can better help you handle specific image generation tasks. For example, by fine-tuning on a person's images, you can generate multiple pictures of that person while maintaining task consistency.

A lot of previous work focused on designing new networks to facilitate specific tasks. For instance, ControlNet was proposed to handle image conditions, and IP-Adapter was constructed to maintain ID features. If you want to perform new tasks, you need to build new architectures and repeatedly debug them. Adding and adjusting extra network parameters is usually time-consuming and labor-intensive, which is not user-friendly and cost-efficient enough. However, with Omnigen, all of this becomes very simple.

By comparison, Omnigen can accept multi-modal conditional inputs and has been pre-trained on various tasks. You can fine-tune it on any task without designing specialized networks like ControlNet or IP-Adapter for a specific task.

All you need to do is prepare the data and start training. You can break the limitations of previous models, allowing Omnigen to accomplish a variety of interesting tasks, even those that have never been done before.

Installation

git clone https://github.com/VectorSpaceLab/OmniGen.git
cd OmniGen
pip install -e .

Full fine-tuning

Fine-tuning command

accelerate launch \
    --num_processes=1 \
    --use_fsdp \
    --fsdp_offload_params false \
    --fsdp_sharding_strategy SHARD_GRAD_OP \
    --fsdp_auto_wrap_policy TRANSFORMER_BASED_WRAP \
    --fsdp_transformer_layer_cls_to_wrap Phi3DecoderLayer \
    --fsdp_state_dict_type FULL_STATE_DICT \
    --fsdp_forward_prefetch false \
    --fsdp_use_orig_params True \
    --fsdp_cpu_ram_efficient_loading false \
    --fsdp_sync_module_states True \
    train.py \
    --model_name_or_path Shitao/OmniGen-v1 \
    --json_file ./toy_data/toy_data.jsonl \
    --image_path ./toy_data/images \
    --batch_size_per_device 1 \
    --lr 2e-5 \
    --keep_raw_resolution \
    --max_image_size 1024 \
    --gradient_accumulation_steps 1 \
    --ckpt_every 100 \
    --epochs 100 \
    --log_every 1 \
    --results_dir ./results/toy_finetune

Some important arguments:

  • num_processes: number of GPU to use for training
  • model_name_or_path: path to the pretrained model
  • json_file: path to the json file containing the training data, e.g., ./toy_data/toy_data.jsonl
  • image_path: path to the image folder, e.g., ./toy_data/images
  • batch_size_per_device: batch size per device
  • lr: learning rate
  • keep_raw_resolution: whether to keep the original resolution of the image, if not, all images will be resized to (max_image_size, max_image_size)
  • max_image_size: max image size
  • gradient_accumulation_steps: number of steps to accumulate gradients
  • ckpt_every: number of steps to save checkpoint
  • epochs: number of epochs
  • log_every: number of steps to log
  • results_dir: path to the results folder

The data format of json_file is as follows:

{
    "instruction": str, 
    "input_images": [str, str, ...], 
    "output_images": str
}

You can see a toy example in ./toy_data/toy_data.jsonl.

If an OOM(Out of Memory) issue occurs, you can try to decrease the batch_size_per_device or max_image_size. You can also try to use LoRA instead of full fine-tuning.

Inference

The checkpoint can be found at {results_dir}/checkpoints/*. You can use the following command to load saved checkpoint:

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("checkpoint_path")  # e.g., ./results/toy_finetune/checkpoints/0000200

LoRA fine-tuning

LoRA fine-tuning is a simple way to fine-tune OmniGen with less GPU memory. To use lora, you should add --use_lora and --lora_rank to the command.

accelerate launch \
    --num_processes=1 \
    train.py \
    --model_name_or_path Shitao/OmniGen-v1 \
    --batch_size_per_device 2 \
    --condition_dropout_prob 0.01 \
    --lr 3e-4 \
    --use_lora \
    --lora_rank 8 \
    --json_file ./toy_data/toy_data.jsonl \
    --image_path ./toy_data/images \
    --max_input_length_limit 18000 \
    --keep_raw_resolution \
    --max_image_size 1024 \
    --gradient_accumulation_steps 1 \
    --ckpt_every 100 \
    --epochs 100 \
    --log_every 1 \
    --results_dir ./results/toy_finetune_lora

Inference

The checkpoint can be found at {results_dir}/checkpoints/*. You can use the following command to load checkpoint:

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
pipe.merge_lora("checkpoint_path")  # e.g., ./results/toy_finetune_lora/checkpoints/0000100

A simple example

Here is an example for learning new concepts: "sks dog". We use five images of one dog from dog-example.

The json file is ./toy_data/toy_subject_data.jsonl, and the images have been saved in ./toy_data/images.

accelerate launch \
    --num_processes=1 \
    train.py \
    --model_name_or_path Shitao/OmniGen-v1 \
    --batch_size_per_device 2 \
    --condition_dropout_prob 0.01 \
    --lr 1e-3 \
    --use_lora \
    --lora_rank 8 \
    --json_file ./toy_data/toy_subject_data.jsonl \
    --image_path ./toy_data/images \
    --max_input_length_limit 18000 \
    --keep_raw_resolution \
    --max_image_size 1024 \
    --gradient_accumulation_steps 1 \
    --ckpt_every 100 \
    --epochs 200 \
    --log_every 1 \
    --results_dir ./results/toy_finetune_lora

After training, you can use the following command to generate images:

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
pipe.merge_lora("checkpoint_path") # e.g., ./results/toy_finetune_lora/checkpoints/0000200

images = pipe(
    prompt="a photo of sks dog running in the snow", 
    height=1024, 
    width=1024, 
    guidance_scale=3
)
images[0].save("example_sks_dog_snow.png")