gonduras's picture

gonduras

gonduras
Β·

AI & ML interests

ART+AI

Recent Activity

liked a model 2 days ago
tencent/HunyuanVideo
liked a model 12 days ago
cagliostrolab/animagine-xl-3.1
View all activity

Organizations

Stable Diffusion concepts library's profile picture

gonduras's activity

reacted to multimodalart's post with ❀️ 7 months ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! πŸ“

Model
πŸ“ 2 base model variants mentioned: 2B and 8B sizes

πŸ“ New architecture in all abstraction levels:
- πŸ”½ UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention πŸ‘‹
- πŸ†• Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

πŸ“„ 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

πŸ—ƒοΈ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
πŸ” A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
βœ… State of the art in automated evals for composition and prompt understanding
βœ… Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
Β·