State of the model?

#4
by GeroldMeisinger - opened

thank you so much for the training the first SD3 controlnet models and integrating support in diffusers!
in https://github.com/huggingface/diffusers/pull/8566#issuecomment-2169316913 you mention the model is "beta". please also mention and update the current state in the readme. should it be release quality yet?

as of now my canny results are rather uncanny:

canny 1024x1024 (low=0.1, high=0.4)

conditioning_scale=0.7

conditioning_scale=1.0

Stable Diffusion 3 in ComfyUI (no controlnet)

GeroldMeisinger changed discussion title from Mention release state of the model (current results with canny are... uncanny :) to State of the model?
InstantX org

Your canny image is kind of sparse. Try another canny image with scale=0.8

canny-edge.jpg

prompt: a full-length portrait of a young woman with a pearl earring and a blue head scarf is captured in a close-up shot against a dark backdrop. the woman is facing the viewer, her head turned slightly to the right. her hair is neatly pulled back into a blue head scarf, which is draped over her left shoulder. the scarf is tied at the back with a white collar. her eyes are wide open, and she has a red lip. her mouth is slightly ajar, revealing a hint of teeth. her ears are pierced with a gold earring, and a pearl earring is dangling from her left ear. her right ear is covered by a yellow scarf, which is draped over her left shoulder. the backdrop is a dark brown canvas, providing a stark contrast to the woman's vibrant colors.

image.jpg

sparse maybe, but it is one of the test images of the original controlnet: https://github.com/lllyasviel/ControlNet#controlnet-with-canny-edge
and personally I found dog2 to be one of the best example image for evaluating edge-detection controlnets in my own controlnet trainings as outlined here: https://github.com/lllyasviel/ControlNet/discussions/318#discussioncomment-7176692

this is the result I get without using a controlnet thus it's hard to tell what the part of the controlnet was:

to me it seems rather the controlnet is not yet fully trained(?), that's why I'm asking about the release state. what is the batch sizes, how many samples/epochs?

backlink to github. user kijai seems to get good results too: https://github.com/comfyanonymous/ComfyUI/issues/3734#issuecomment-2186084970

@wanghaofan
I tried with ComfyUI implementation now but still not getting good results with Canny :/ I'm very grateful of your efforts and I want this to work.
Can you please tell me what I'm doing wrong or please provide more examples (ComfyUI workflow is included in the image):
dog2result.png

dog2upscale.png

dog2canny.png

I trained canny controlnets on my own and this result looks to me as if a) the CN didn't yet fully converge or b) the model collapsed at some point. Canny is usually very resilient to bad input.

@GeroldMeisinger hey! unrelated to SD 3 CN

I found your article on training controlnets super insightful and would love to chat / collaborate on training an SDXL CN!

I am building https://glyf.space/

3D rendering powered by SD

email me at rishi@glyf.space if you are interested in chatting!

@GeroldMeisinger Hey, I'm also training SD3 controlnets and experiencing the "mode collapse" problem? Have you figured out the cause of this phenomenon? My dataset is about 5m with softedge condition images. The training batch size is 120. The training start to converge in 1k steps. But after about 12k iterations, the results start to "collapse" and after 17k iterations, the results and totally collapsed.
I've been searching for similar issues for days. It seems like this discussion is the only one most related to what I'm experiencing now?

image.png
1000 iterations

image.png
12000 iterations

image.png
17000 iterations

yes, I got the same result sometime where images started to look grainy, see here https://civitai.com/articles/2078#heading-35435 -> failed training. I can't tell you why this happens, it seems to happen at random in some of my training runs. if I just started again with the same settings, it work. my assumption is that under certain circumstances you get a value overflow and at this point it cannot heal anymore. just restart it.
if you get conversion at 1k steps already, increase total batch size and reduce learning rate a bit, see here https://github.com/lllyasviel/ControlNet/discussions/318#discussioncomment-7176692 . or if you don't care about quality so much and it works already, just take an earlier checkpoint (e.g. 10k).

@GeroldMeisinger Hey, I'm also training SD3 controlnets and experiencing the "mode collapse" problem? Have you figured out the cause of this phenomenon? My dataset is about 5m with softedge condition images. The training batch size is 120. The training start to converge in 1k steps. But after about 12k iterations, the results start to "collapse" and after 17k iterations, the results and totally collapsed.
I've been searching for similar issues for days. It seems like this discussion is the only one most related to what I'm experiencing now?

image.png
1000 iterations

image.png
12000 iterations

image.png
17000 iterations

Hi, i have met the exactly the same issue like you, after ~15k iters the image is full of block. Could you contact me with email: xduzhangjiayu@163.com ? We can talk about it together. I've been stuck on this for two months

yes, I got the same result sometime where images started to look grainy, see here https://civitai.com/articles/2078#heading-35435 -> failed training. I can't tell you why this happens, it seems to happen at random in some of my training runs. if I just started again with the same settings, it work. my assumption is that under certain circumstances you get a value overflow and at this point it cannot heal anymore. just restart it.
if you get conversion at 1k steps already, increase total batch size and reduce learning rate a bit, see here https://github.com/lllyasviel/ControlNet/discussions/318#discussioncomment-7176692 . or if you don't care about quality so much and it works already, just take an earlier checkpoint (e.g. 10k).

Hey,
Here is my result (image is too blocky), similar to your results?
Your suggestion to increase the batch_size? And what do you mean 'just restart it' ?

img.png

I think the image looks "fine" in the sense that this is not the "model collapse" I noticed when training controlnets in SD1.5. what i saw was images becoming more and more grainy, overbright and the dog and cat would deform. unfortunately I never saved any images of this effect. what you are seeing are deconvolution artifacts https://www.neuralception.com/convs-deconvs-artifacts/ however I cannot tell you why this happens or if it is related to SD3, SD3 controlnets or your training. I only ever noticed them in vanilla flux-dev generations https://www.reddit.com/r/comfyui/comments/1eqepmv/3000_images_from_img2txt2img_generated_with/

These are (unfortunate) vanilla Flux-Dev generations
3000-images-from-img2txt2img-generated-with-flux-dev-and-v0-7k2ox060n8id1.webp

Thanks for the reply!
When I trained even more steps (~10000 steps), the images are like:

10000steps.png

I don't know, sorry

Sign up or log in to comment