Questions

#2
by LOL2024 - opened

Hello, I just have some questions about this model, could you tell me how do you train the model, fine-tune or train from scratch? how many pictures get used for train this model? why architecture is SD1.5 instead of SDXL or any other architectures(such as Flux.1 Schnell or PixArt-Sigma)? And will you add CC0 images from Wikimedia Commons to the training dataset in the future?

excellent question, all of them will be answered in a paper soon, you can look at the microcosmos datatse that was what was used, I'm still working on a final version, The model was trained from scratch, not fine-tuned, and we used the Microcosmos dataset for this (you can check it out here: Microcosmos on HuggingFace). Right now, it's not the final version, but it's a good start. We used about 15k images, which is a bit on the lower side (the ideal number is closer to 50k), so it’s going to be overfitted, but still usable. I want to reach the mark of half a million quality + capitions cc0 images,

I went with Stable Diffusion 1.5 because it’s powerful but also less computationally demanding than something like SDXL or Flux.1 Schnell I have jus a 3090 and two 3060. I actually think good V-prediction models (which could be our next iteration) are on par with Flux and SDXL, so there's room to experiment with that in the future. As for Wikimedia Commons, it’s a bit tricky because while a lot of it is marked CC0, sometimes it’s not actually free for use, so if you're aiming for 100% correct usage, you have to be super careful with those images.

one of the biggest challenges now is producing high quality captures I'm using/ GIT Generative Image-to-text Transformer and gemini, and human reviwning all with a small number of images each error is much more significant and it saddens me that many of the cool things I can't put it in the data because they all have copyrgih, I hope to see how legislators will see this sitution cause i consider add some ai generted content to the dataset tha would help the missing concepts,

Thanks for answer, this model is very great, I hope I could see the final version soon, however, I have another question: Will it feature a rating system similar to Pony Diffusion, using prompts such as 'score_9', 'score_7_up', 'source_cartoon', and 'rating_safe' to control the quality and the contents of generated images? Or similar to common SD1.5 and SDXL models, using prompt such as 'masterpiece", 'best quality', 'medium quality' and 'worst quality' to control the quality of generated images?

In some subsest like the one from the open game art I did some score rating, because some of the art where good other where boring, i might ad more nuances of that in newer classified images, most of the CC0 images are boring...

Well, could you tell me how do I contribute CC0 images or own CC0 works(if exist) to the training dataset?
Just checking does it(they) really licensed CC0, upload it(them) and make pull requests for them?

I think it would be like that. I usually group elements by what is on top and what the images contain, and then by the source of order comes the image site and as it is licensed on that site, I need to add good or interesting illustrations

Emm, Does CC0 3d resources on opengameart.org get already used in this model's training dataset, such as texture resources for 3d models and preview (or multiple views) of 3d models.

I havent add any 3d imagens yet, i think it might help the model to achive betthe volume or reduce some flatiness, there are some cc0 models on sketchfab i have download some and take some screenshots i , but be aware tha some of the are marked as CC0 but when reading the descriptions its is cc by or other thinkg (teoricaly it would be ok to ad in the datatset, but I give more preference to a full CC0), If u could collect some 3d screenshots i would gladly classify and and to microcosmos dataset

Preferably larger equal to 768x768

Sorry, I forget to ask you, what's the recommend caption ways for the contribute images? WD1.4, BLIP, JoyTag or any others?Does it will using fine-tune CLIP model like this* in the future? Does the training dataset allow to contribute images that rating questionable or rating explicit?Could I using prompt such as "from Met Museum Open Access", "upload on opengameart"or "by (artist)" tags to speciefic generate results' styles on the model?

*Here is the fine-tune codes of the CLIP fine-tune model that mentioned

I used GIT GEMINI API, which is free but has some censorship or refuses to some images, and Jointtagger and human caption all in diferente subsest, each one has been used in different subsets the performance and accuracy of the model depends of the subject

I find it hard to believe that explicit images are CC0, but if you still want to contribute I don't see a problem as long as you mention or highlight them, I think there are some strange things like that in the Met Museum,

Feel free to contribute whi any cc0 and caption as long as it is accurate, a newer model is cooking by the end of the week it will have something to test

I wished to not include artist tags so it would be like styles like should have been pony XL,

I wished to not include artist tags so it would be like styles like should have been pony XL,

https://github.com/6DammK9/nai-anime-pure-negative-prompt/blob/main/ch02/pony_sd.md
In fact, PonyXL trained artists' styles by pretty weird way

In the beginning, it was sai that pony xl was trained just using styles of aesthetics not artists tags, but this was partially untrue which caused a lot of confusion because it seemed like data laundering which is something I would like to avoid. Some laws allow the use of data even through web scraping if there is no individualization.

The idea of ​​the microcosmos dataset is a small dataset with a large variety, the idea of ​​the model is to show how it is possible to create a model that can satisfy a legislation like mine 🇧🇷 , and finally the article to detail all of this. Probably when I fish everything

If the laws change and all this is irrelevant, another topic I would like to talk about since we are almost talking here, is about "distillation" which is when training a model with the best results from another one, so you have a model with fewer parameters and you could say that your data is "clean"

If I were to develop software the right way in my country that does not have fair use or any other legal permissiveness, it fits the definition of piracy. Most Latin American countries adopt civil law instead of common law.

The result is that people from other countries can train models with more infinity more data than any developer could legally obtain.

one of the biggest challenges now is producing high quality captures I'm using/ GIT Generative Image-to-text Transformer and gemini, and human reviwning all with a small number of images each error is much more significant and it saddens me that many of the cool things I can't put it in the data because they all have copyrgih, I hope to see how legislators will see this sitution cause i consider add some ai generted content to the dataset tha would help the missing concepts,

Sorry, I want to know how to prevent ai contents when generate images by using this model, just add "ai content, generated, diffusion" to negative prompt?

its not possible to diferentiate in some cases, kisCC0 is one of the fonts of the dataset, it cointain some images i belive are ai generated, is hard do diferenciate just gerneric images that normlay exist in free sites from the one that are generic but ai , i suggest use in the negative use "generic" some images i find not so shure i label this way, also some images of pexels are ai generated but is hard to afirmate is not ai or not, as well as stock sites flooded wiht this kind of images so that a problem for the future, AI images are not icluded, or intetionaly include, but might me in a future interation, but for this its safe to say that most of traing data is old scholl CC0/CC no ai content, a vible way to eliminate that aspect woud be, train a negative lora/textual inversion of just theses types of generic/ai images, to exclude this aspecs

next model isteration will come out in 180 hours of traing in 16452 images

Although it's model that train on CC0 dataset, but its latest version actually could generate Pikachu lol.
grid-0000.png

I noticed this as well. I believe it might be a remnant of some latent representation ont CLIP model itself, as this phenomenon is quite common.Even though it has not been trained with images of a picachu, the clip knows how to represent it. Many CC-trained models like CommonCanvas and others , sometimes generate images that resemble well-known characters like Pikachu, The only way to resolve this would be to completely retrain the clip which woud be a nighmare

but the samen don happend in lucario or other pokemons, that why i am shure is a clip thing
grid-0000.png
lucario, pokemon , digital_media_(illustration), best quality, 8k,
Negative prompt: worst quality, low quality, blur, EasyNegative, lowres, bad anatomy, bad hands, text, missing fingers, extra digit, fewer digits, blur, low res

Pixel-Dust changed discussion status to closed
Pixel-Dust changed discussion status to open

I pressed close instead of commenting lol, another aspect is that even though "You can't register a style only individual works" it doesn't apply to characters. This creates a complex situation with fan art and parodies and imitations. Their style itself can be used but the characters they contain can't, one of the reasons why it doesn't include some arts even that were marked as CC0 or that were given to me by some artists because the characters were from some story or something like that.

Sign up or log in to comment