Greetings

#1
by RedSparkie - opened

Hey Alex,
I've been following you some time because of your musical projects. At the beginning I thought this tegridy thing was based on South Park, but had no clue. Now I can confirm it is so...
What have you done here? An Imagen model finetuning on 128px squared images? And the most important thing, why? I mean, if you wanted to create South Park-like images, using a LORA for SD 3.5 or Flux would be better I guess, and I could help you with that (even though I guess that on Civitai there are several).
Just a follower trying to help,
have a nice day ^^.

@RedSparkie Hey RedSparkie,

Thank you for writing and for offering to help :) I really appreciate it :)

Yes, the Tegridy thing is from South Park because I am indeed a big fan of the show :) Although I am into classical stuff only (Seasons 1-18).

Yes, I wanted to see if pre-training diffusion model from scratch would produce better results in terms of style-matching because fine-tuning usually produces distorted results. However, if you think SD3.5 with LoRA would be able to handle the exact style, I would love to colab with you to make it happen :)

Here are some samples from my Imagine-South-Park large dataset:

sample-1000.png
sample-1020.png

As you can see, it did manage to draw the faces properly (more or less) but not the background or bodies... Not sure why...

Let me know.

Sincerely,

Alex

Hi Alex,

It has indeed captured the South Park style very well for a 128px model. The overall aesthetic is quite on point. However, the model seems to struggle with the characters morphology, especially in separating bodies and background elements accurately.

You’re also correct regarding the current LoRAs for Flux Dev—they can be somewhat inconsistent and occasionally produce artifacts that disrupt the desired style (one of them on Civitai has Kenny wearing his sweater and the aspect ratio of her body is strange).

For a potential solution, I’ll give it a try with a LoRA for Stable Diffusion 3.5 myself. If not, maybe a solution could be to perfectly tag every image of a dataset taken from a Wiki, so the model can understand every character separately and then add another set of general images of the show.

Unfortunately, I don’t have experience with training models from scratch, so I wouldn’t be of much help in that regard. Nonetheless, if we can get the LoRA training done well, it might stabilize the style generation and eliminate many artifacts.

Why did you tried from scratch and using 128px images, lack of computer power?

Best regards,
RedSparkie.

@RedSparkie Thank you for your response :)

Yes, try SD3.5 Large with LoRA. IMHO SD3.5 Large is much better than Flux-dev so it may produce better results.

To answer your question, I used 12xx128 with Imagen because I wanted to see if it will work at all and because I wanted to see if pre-training on pure SouthPark dataset would preserve the style.

Otherwise, let me know about LoRA results. I can help with training because I have some compute but I will need good code because I have no idea how to make it.

Let me know.

Sincerely,

Alex

Sign up or log in to comment