Model Card for VRoid Diffusion

This is a latent text-to-image diffusion model to demonstrate how U-Net training affects the generated images.

  • Text Encoder is from OpenCLIP ViT-H/14, MIT License, Training Data : LAION-2B
  • VAE is from Mitsua Diffusion One, Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
  • U-Net is trained from scratch using full version of VRoid Image Dataset Lite with some modifications.
  • VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.

Model Details

  • vroid_diffusion_test.safetensors
    • base variant.
  • vroid_diffusion_test_invert_red_blue.safetensors
    • red and blue in the caption is swapped.
    • pink and skyblue in the caption is swapped.
  • vroid_diffusion_test_monochrome.safetensors
    • all training images are converted to grayscale.

Model Variant

Model Description

  • Developed by: Abstract Engine.
  • License: Mitsua Open RAIL-M License.

Uses

Direct Use

Text-to-Image generation for research and educational purposes.

Out-of-Scope Use

Any deployed use case of the model.

Training Details

  • Trained resolution : 256x256
  • Batch Size : 48
  • Steps : 45k
  • LR : 1e-5 with warmup 1000 steps

Training Data

We use full version of VRoid Image Dataset Lite with some modifications.

Downloads last month
37
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Mitsua/vroid-diffusion-test

Spaces using Mitsua/vroid-diffusion-test 2