Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

pixart-900m-1024-ft-large

This is a full rank finetune derived from terminusresearch/pixart-900m-1024.

The main validation prompt used during training was:

ethnographic photography of teddy bear at a picnic holding a sign that says SOON, sitting next to a red sphere which is inside a capsule

Validation settings

  • CFG: 8.5
  • CFG Rescale: 0.0
  • Steps: 30
  • Sampler: euler
  • Seed: 42
  • Resolutions: 1024x1024,1280x768,960x1152

Note: The validation settings are not necessarily the same as the training settings.

You can find some example images in the following gallery:

The text encoder was not trained. You may reuse the base model text encoder for inference.

Training settings

  • Training epochs: 1
  • Training steps: 6500
  • Learning rate: 1e-06
  • Effective batch size: 384
    • Micro-batch size: 24
    • Gradient accumulation steps: 2
    • Number of GPUs: 8
  • Prediction type: epsilon
  • Rescaled betas zero SNR: False
  • Optimizer: AdamW, stochastic bf16
  • Precision: Pure BF16
  • Xformers: Not used

Datasets

photo-concept-bucket

  • Repeats: 0
  • Total number of images: ~559104
  • Total number of aspect buckets: 1
  • Resolution: 1.0 megapixels
  • Cropped: True
  • Crop style: random
  • Crop aspect: square

dalle3

  • Repeats: 0
  • Total number of images: ~972672
  • Total number of aspect buckets: 1
  • Resolution: 1.0 megapixels
  • Cropped: True
  • Crop style: center
  • Crop aspect: square

nijijourney-v6-520k-raw

  • Repeats: 0
  • Total number of images: ~415872
  • Total number of aspect buckets: 1
  • Resolution: 1.0 megapixels
  • Cropped: True
  • Crop style: center
  • Crop aspect: square

midjourney-v6-520k-raw

  • Repeats: 0
  • Total number of images: ~390912
  • Total number of aspect buckets: 1
  • Resolution: 1.0 megapixels
  • Cropped: True
  • Crop style: center
  • Crop aspect: square

Inference

import torch
from diffusers import DiffusionPipeline



model_id = "pixart-900m-1024-ft-large"
prompt = "ethnographic photography of teddy bear at a picnic holding a sign that says SOON, sitting next to a red sphere which is inside a capsule"
negative_prompt = "malformed, disgusting, overexposed, washed-out"

pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
image = pipeline(
    prompt=prompt,
    negative_prompt='blurry',
    num_inference_steps=30,
    generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(1641421826),
    width=1152,
    height=768,
    guidance_scale=8.5,
    guidance_rescale=0.0,
).images[0]
image.save("output.png", format="PNG")
Downloads last month
27,047

Finetuned from

Space using ptx0/pixart-900m-1024-ft-large 1