LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 13 days ago • 105
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published 24 days ago • 23
view article Article Advanced Flux Dreambooth LoRA Training with 🧨 diffusers By linoyts • Oct 21 • 29
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Oct 24 • 508
CommonCanvas Collection Collection of models trained on the CommonCatalogue datasets • 8 items • Updated May 16 • 9
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published Sep 19 • 22
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 74
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published Sep 2 • 12
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers Paper • 2408.17131 • Published Aug 30 • 11
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated 15 days ago • 500
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published Aug 27 • 28
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22 • 34
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization Paper • 2408.08019 • Published Aug 15 • 10