Sayak Paul's picture

Sayak Paul

sayakpaul

·

https://sayak.dev

AI & ML interests

Diffusion models, representation learning

Recent Activity

liked a model about 2 hours ago

Cseti/Wan-LoRA-Arcane-Jinx-v1

authored a paper about 3 hours ago

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

commented on a paper about 7 hours ago

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

View all activity

Organizations

sayakpaul's activity

upvoted an article 2 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

2 days ago

• 232

upvoted an article 14 days ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Jun 24, 2024

• 189

upvoted an article 16 days ago

Article

Distilling from Dialogues: Finding Meaning in LLM Interactions

By

•

17 days ago

• 4

upvoted a collection 16 days ago

Remote VAE Inference Endpoints

Models and handler code used in https://huggingface.co/blog/remote_vae • 5 items • Updated 4 days ago • 4

upvoted an article 18 days ago

Article

Remote VAEs for decoding with HF endpoints 🤗

18 days ago

• 36

upvoted an article 21 days ago

Article

SigLIP 2: A better multilingual vision language encoder

21 days ago

• 134

upvoted a paper 21 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 22 days ago • 129

upvoted a collection 24 days ago

PaliGemma 2 Release

Vision-Language Models available in multiple 3B, 10B and 28B variants. • 32 items • Updated 2 days ago • 145

upvoted 3 articles about 1 month ago

Article

Build awesome datasets for video generation

about 1 month ago

• 27

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.16k

Article

The AI tools for Art Newsletter - Issue 1

Jan 31

• 70

upvoted an article about 2 months ago

Article

Timm ❤️ Transformers: Use any timm model with transformers

Jan 16

• 44

upvoted a paper 2 months ago

LTX-Video: Realtime Video Latent Diffusion

Paper • 2501.00103 • Published Dec 30, 2024 • 42

upvoted an article 4 months ago

Article

Let’s make a generation of amazing image generation models

By

and 4 others •

Nov 26, 2024

• 33

upvoted 2 articles 5 months ago

Article

🧨 Diffusers welcomes Stable Diffusion 3.5 Large

Oct 22, 2024

• 50

Article

Advanced Flux Dreambooth LoRA Training with 🧨 diffusers

By

and 1 other •

Oct 21, 2024

• 34

upvoted a paper 5 months ago

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Paper • 2403.05121 • Published Mar 8, 2024 • 24

upvoted 2 papers 7 months ago

Enhancing Training Efficiency Using Packing with Flash Attention

Paper • 2407.09105 • Published Jul 12, 2024 • 15

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12, 2024 • 39

upvoted a collection 8 months ago

Flan-T5 release

The Flan-T5 covers 4 checkpoints of different sizes each time. It also includes upgrades versions trained using Universal sampling • 7 items • Updated 2 days ago • 23