Diffusers
English
flux-lora-resizing / README.md
sayakpaul's picture
sayakpaul HF staff
Update README.md
75a5c8a verified
metadata
language:
  - en
library_name: diffusers
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md

LoRA is the de-facto technique for quickly adapting a pre-trained large model on custom use cases. Typically, LoRA matrices are low-rank in nature. Now, the word “low” can vary depending on the context, but usually, for a large diffusion model like Flux, a rank of 128 can be considered high. This is because users may often need to keep multiple LoRAs unfused in memory to be able to quickly switch between them. So, the higher the rank, the higher the memory on top of the volume of the base model.

So, what if we could take an existing LoRA checkpoint with a high rank and reduce its rank even further to:

  • Reduce the memory requirements
  • Enable use cases like torch.compile() (which require all the LoRAs to be of the same rank to avoid re-compilation)

This project explores two options to reduce the original LoRA checkpoint into an even smaller one:

  • Random projections
  • SVD

We have also explored the opposite direction of the above i.e., take a low-rank LoRA and increase its rank with orthoginal completion. Check out this section for more details (code, results, etc.).

Random projections

Basic idea:

  1. Generate a random projection matrix: R = torch.randn(new_rank, original_rank, dtype=torch.float32) / torch.sqrt(torch.tensor(new_rank, dtype=torch.float32)).

  2. Then compute the new LoRA up and down matrices:

    # We keep R in torch.float32 for numerical stability.
    lora_A_new = (R @ lora_A.to(R.dtype)).to(lora_A.dtype)
    lora_B_new = (lora_B.to(R.dtype) @ R.T).to(lora_B.dtype)
    

    If lora_A and lora_B had shapes of (42, 3072) and (3072, 42) respectively, lora_A_new and lora_B_new will have (4, 3072) and (3072, 4), respectively.

Results

Tried on this LoRA: https://huggingface.co/glif/how2draw. Unless explicitly specified, a rank of 4 was used for all experiments. Here’s a side-by-side comparison of the original and the reduced LoRAs (on the same seed).

Inference code
from diffusers import DiffusionPipeline 
import torch 

pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
# Change accordingly.
lora_id = "How2Draw-V2_000002800_svd.safetensors"
pipe.load_lora_weights(lora_id)

prompts = [
    "Yorkshire Terrier with smile, How2Draw",
    "a dolphin, How2Draw",
    "an owl, How3Draw",
    "A silhouette of a girl performing a ballet pose, with elegant lines to suggest grace and movement. The background can include simple outlines of ballet shoes and a music note. The image should convey elegance and poise in a minimalistic style, How2Draw"
]
images = pipe(
    prompts, num_inference_steps=50, max_sequence_length=512, guidance_scale=3.5, generator=torch.manual_seed(0)
).images
Image 1 Yorkshire Terrier with smile, How2Draw
Image 2 a dolphin, How2Draw
Image 3 an owl, How3Draw
Image 4 A silhouette of a girl performing a ballet pose, with elegant lines to suggest grace and movement. The background can include simple outlines of ballet shoes and a music note. The image should convey elegance and poise in a minimalistic style, How2Draw

Code: low_rank_lora.py

Notes

  • One should experiment with the new_rank parameter to obtain the desired trade-off between performance and memory. With a new_rank of 4, we reduce the size of the LoRA from 451MB to 42MB.
  • There is a use_sparse option in the script above for using sparse random projection matrices.

SVD

Results

image.png

image.png

image.png

image.png

Randomized SVD

Full SVD can be time-consuming. Truncated SVD is useful very large sparse matrices. We can use randomized SVD for none-to-negligible loss in quality but significantly faster speed.

Results

image.png

image.png

image.png

image.png

Code: svd_low_rank_lora.py

Tune the knobs in SVD

  • new_rank as always
  • niter when using randomized SVD

Reduced checkpoints

LoRA rank upsampling

We also explored the opposite direction of what we presented above. We do this by using "orthogonal extension" across the rank dimensions. Since we are increasing the ranks, we thought "rank upsampling" was a cool name! Check out upsample_lora_rank.py script for the implementation.

We applied this technique to cocktailpeanut/optimus to increase the rank from 4 to 16. You can find the checkpoint here.

Results

Right: original Left: upsampled

Image 1 optimus is cleaning the house with broomstick
Image 2 optimus is a DJ performing at a hip nightclub
Image 3 optimus is competing in a bboy break dancing competition
Image 4 optimus is playing tennis in a tennis court
Code
from diffusers import FluxPipeline
import torch 

pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
).to("cuda")
# Change this accordingly.
pipeline.load_lora_weights("optimus_16.safetensors")

prompts = [
    "optimus is cleaning the house with broomstick",
    "optimus is a DJ performing at a hip nightclub",
    "optimus is competing in a bboy break dancing competition",
    "optimus is playing tennis in a tennis court"
]
images = pipeline(
    prompts, 
    num_inference_steps=50,
    guidance_scale=3.5,
    max_sequence_length=512,
    generator=torch.manual_seed(0)
).images
for i, image in enumerate(images):
    image.save(f"{i}.png")