VAE for high-resolution image generation with stable diffusion
This VAE is trained by adding only one step of noise to the latent and denoising the latent with U-net, to avoid oversensitivity to latent. This process reduces the possibility to describe too much detail in some objects, such as plants and eyes, etc., than in the surroundings when generated at high resolution. The dataset consists of 19k images tagged nijijourneyv5 and published on the web, and was denoised using the same dataset trained models.
sample
training details
- base model: VAE developed by CompVis
- 19k 1images
- 2 epochs
- Aspect Ratio Bucketing based on 768p resolution
- multires noise
- lr: 1e-5
- precision: fp32
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.