gonduras

AI & ML interests

ART+AI

Recent Activity

liked a model 2 days ago

tencent/HunyuanVideo

liked a model 12 days ago

monster-labs/control_v1p_sd15_qrcode_monster

liked a model 12 days ago

cagliostrolab/animagine-xl-3.1

View all activity

Organizations

gonduras's activity

liked a model 2 days ago

tencent/HunyuanVideo

Text-to-Video • Updated 17 days ago • 9.26k • 1.34k

liked 2 models 12 days ago

monster-labs/control_v1p_sd15_qrcode_monster

Updated Jul 21, 2023 • 89.5k • 1.37k

cagliostrolab/animagine-xl-3.1

Text-to-Image • Updated Mar 18, 2024 • 307k • 639

liked 2 models 14 days ago

genmo/mochi-1-preview

Text-to-Video • Updated 16 days ago • 42.3k • 1.13k

Lightricks/LTX-Video

Image-to-Video • Updated 15 days ago • 82.6k • 811

liked 3 models 22 days ago

liked a model about 2 months ago

briaai/RMBG-2.0

Image Segmentation • Updated 11 days ago • 255k • 552

liked 3 models 4 months ago

PixArt-alpha/PixArt-Sigma

Updated Apr 22, 2024 • 93

fudan-generative-ai/hallo

Updated Jul 11, 2024 • 84

google/gemma-2-27b

Text Generation • Updated Aug 7, 2024 • 56.7k • 192

liked a model 5 months ago

city96/FLUX.1-dev-gguf

Text-to-Image • Updated Aug 18, 2024 • 146k • 785

liked 4 models 7 months ago

stabilityai/stable-diffusion-3-medium

Text-to-Image • Updated Aug 12, 2024 • 21.8k • 4.65k

meta-llama/Meta-Llama-3-8B-Instruct

Text Generation • Updated Sep 27, 2024 • 1.19M • • 3.73k

meta-llama/Meta-Llama-3-8B

Text Generation • Updated Sep 27, 2024 • 497k • 5.94k

Phind/Phind-CodeLlama-34B-v2

Text Generation • Updated Aug 28, 2023 • 1.77k • 830

liked a Space 7 months ago

Running on Zero

863

😻

ToonCrafter

liked a model 7 months ago

Intel/ldm3d

Text-to-3D • Updated Mar 1, 2024 • 89 • 52

reacted to multimodalart's post with ❤️ 7 months ago

Post

The Stable Diffusion 3 research paper broken down, including some overlooked details! 📝

Model
📏 2 base model variants mentioned: 2B and 8B sizes

📐 New architecture in all abstraction levels:
- 🔽 UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention 👋
- 🆕 Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

📄 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

🗃️ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
🔁 A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
✅ State of the art in automated evals for composition and prompt understanding
✅ Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf

3 replies