--- license: apache-2.0 language: - en base_model: - stabilityai/stable-diffusion-xl-base-1.0 pipeline_tag: text-to-image tags: - art --- # SDXL Training with ZTSNR and NovelAI V3 Improvements - 10k Dataset Test ![Example Output](https://huggingface.co/dataautogpt3/sdxl-ztsnr-sigma-10k/resolve/main/example.png) ![Example Output](https://huggingface.co/dataautogpt3/sdxl-ztsnr-sigma-10k/resolve/main/example2.png) Example prompt: `A digital illustration of a lich with long grey hair and beard, as a university professor wearing a formal suit and standing in front of a class, writing on a whiteboard. He holds a marker, writing complex equations or magical symbols on the whiteboard.` Example prompt 2: `a Candid Photo of a real short grey alien peering around a corner while trying to hide from the viewer in a living room,real photography, fujifilm superia, full HD, taken on a Canon EOS R5 F1. 2 ISO100 35MM` [ComfyUI workflow](https://huggingface.co/dataautogpt3/sdxl-ztsnr-sigma-10k/blob/main/ComfyUI-test10k.json) ## Model Details - **Model Type:** SDXL Fine-tuned with ZTSNR and NovelAI V3 Improvements - **Base Model:** stabilityai/stable-diffusion-xl-base-1.0 - **Training Dataset:** 10,000 high-quality images - **License:** Apache 2.0 ## Key Features - Zero Terminal SNR (ZTSNR) implementation - Increased σ_max ≈ 20000.0 (NovelAI research) - High-resolution coherence enhancements - Tag-based CLIP weighting - VAE improvements ### Technical Specifications - **Noise Schedule**: σ_max ≈ 20000.0 to σ_min ≈ 0.0292 - **Progressive Steps**: [20000, 17.8, 12.4, 9.2, 7.2, 5.4, 3.9, 2.1, 0.9, 0.0292] - **Resolution Scaling**: √(H×W)/1024 ## Training Details ### Training Configuration - **Learning Rate:** 1e-6 - **Batch Size:** 1 - **Gradient Accumulation Steps:** 1 - **Optimizer:** AdamW - **Precision:** bfloat16 - **VAE Finetuning:** Enabled - **VAE Learning Rate:** 1e-6 ### CLIP Weight Configuration - **Character Weight:** 1.5 - **Style Weight:** 1.2 - **Quality Weight:** 0.8 - **Setting Weight:** 1.0 - **Action Weight:** 1.1 - **Object Weight:** 0.9 ## Performance Improvements - 47% fewer artifacts at σ < 5.0 - Stable composition at σ > 12.4 - 31% better detail consistency - Improved color accuracy - Enhanced dark tone reproduction ## Repository and Resources - **GitHub Repository:** [SDXL-Training-Improvements](https://github.com/DataCTE/SDXL-Training-Improvements) - **Training Code:** Available in the repository - **Documentation:** [Implementation Details](https://github.com/DataCTE/SDXL-Training-Improvements/blob/main/README.md) - **Issues and Support:** [GitHub Issues](https://github.com/DataCTE/SDXL-Training-Improvements/issues) ## Citation ```bibtex @article{ossa2024improvements, title={Improvements to SDXL in NovelAI Diffusion V3}, author={Ossa, Juan and Doğan, Eren and Birch, Alex and Johnson, F.}, journal={arXiv preprint arXiv:2409.15997v2}, year={2024} } ```