sd2-cartoon-blip / README.md
Doron Adler
sd2-cartoon-blip
49c27fc
|
raw
history blame
4.06 kB
metadata
license: creativeml-openrail-m
language:
  - en
thumbnail: >-
  https://huggingface.co/Norod78/Norod78/sd2-cartoon-blip/raw/main/example/Norod78/sd2-cartoon-blip-sample_tile-0.jpg
tags:
  - stable-diffusion
  - stable-diffusion-diffusers
  - text-to-image
datasets:
  - Norod78/cartoon-blip-captions
inference: true

Cartoon diffusion v2.0

*Stable Diffusion v2.0 fine tuned on images from various cartoon shows

If you want more details on how to generate your own blip cpationed dataset see this colab

Training was done using a slightly modified version of Hugging-Face's text to image training example script

About

Put in a text prompt and generate cartoony images

AUTOMATIC1111 webui checkpoint

The main folder contains a .ckpt and a .yaml file to be put in stable-diffusion-webui "stable-diffusion-webui/models/Stable-diffusion" folder and used to generate images

Sample code

from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
import torch

# this will substitute the default PNDM scheduler for K-LMS  
lms = LMSDiscreteScheduler(
    beta_start=0.00085, 
    beta_end=0.012, 
    beta_schedule="scaled_linear"
)

guidance_scale=8.5
steps=50

cartoon_model_path = "Norod78/sd2-cartoon-blip"
cartoon_pipe = StableDiffusionPipeline.from_pretrained(cartoon_model_path, scheduler=lms, torch_dtype=torch.float16)
cartoon_pipe.to("cuda")

def generate(prompt, file_prefix ,samples, seed=42):
    torch.manual_seed(seed)
    prompt += ", Very detailed, clean, high quality, sharp image"
    cartoon_images = cartoon_pipe([prompt] * samples, num_inference_steps=steps, guidance_scale=guidance_scale)["images"]
    for idx, image in enumerate(cartoon_images):
        image.save(f"{file_prefix}-{idx}-{seed}-sd2-cartoon-blip.jpg")

generate("An oil on canvas portrait of Snoop Dogg, Mark Ryden", "01_SnoopDog", 2, 777)
generate("A flemish baroque painting of Kermit from the muppet show", "02_KermitFlemishBaroque", 2, 42)
generate("Gal Gadot in Avatar", "03_GalGadotAvatar", 2, 777)
generate("Ninja turtles, Naoto Hattori", "04_TMNT", 2, 312)
generate("An anime town", "05_AnimeTown", 2, 777)
generate("Family guy taking selfies at the beach", "06_FamilyGuy", 2, 555)
generate("Pikachu as Rick and morty, Eric Wallis", "07_PikachuRnM", 2, 777)
generate("Pikachu as Spongebob, Eric Wallis", "08_PikachuSpongeBob", 2, 42)
generate("An oil painting of Miss. Piggy from the muppets as the Mona Lisa", "09_MsPiggyMonaLisa", 2, 42)
generate("Rick Sanchez in star wars, Dave Dorman", "10_RickStarWars", 2, 42)
generate("An paiting of Southpark with rainbow", "11_Southpark", 2, 777)
generate("An oil painting of Phineas and Pherb hamering on a new machine, Eric Wallis", "12_PhineasPherb", 2, 777)
generate("Bender, Saturno Butto", "13_Bender", 2, 777)
generate("A psychedelic image of Bojack Horseman", "14_Bojack", 2, 777)
generate("A movie poster for Gravity Falls Cthulhu stories", "15_GravityFalls", 2, 777)
generate("A vibrant oil painting portrait of She-Ra", "16_Shira", 2, 512)
#

Images generated by this sample code Images generated by this sample code

Dataset and Training

Finetuned for 25,000 iterations upon stabilityai/stable-diffusion-2-base on BLIP captioned cartoon images using 1xA5000 GPU on my home desktop computer

Trained by @Norod78