EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published 9 days ago • 24
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper • 2411.07975 • Published 10 days ago • 24
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published 10 days ago • 21
Scaling Properties of Diffusion Models for Perceptual Tasks Paper • 2411.08034 • Published 10 days ago • 13
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings Paper • 2411.08017 • Published 10 days ago • 11
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published 25 days ago • 74
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 10 items • Updated about 23 hours ago • 172
timm tiny test models Collection A collection of very small (~300-500k parameter) models at 160x160 resolution, for testing purposes. Trained on ImageNet-1k. • 13 items • Updated Oct 2 • 3
Scalable Ranked Preference Optimization for Text-to-Image Generation Paper • 2410.18013 • Published 30 days ago • 14
Scaling Diffusion Language Models via Adaptation from Autoregressive Models Paper • 2410.17891 • Published 30 days ago • 15
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models Paper • 2410.17637 • Published about 1 month ago • 34
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published 30 days ago • 17
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published about 1 month ago • 43