view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models 5 days ago • 106
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published 17 days ago • 23
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published 17 days ago • 38
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 19 days ago • 60
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published 26 days ago • 42
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Paper • 2405.21048 • Published 29 days ago • 11
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images Paper • 2310.16825 • Published Oct 25, 2023 • 29
Trained on AWS Trainium Collection Collection of models on Hugging Face that have been trained on AWS Trainium. Learn more here: https://huggingface.co/docs/optimum-neuron/index • 7 items • Updated May 7 • 6
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Paper • 2404.16820 • Published Apr 25 • 15
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 240
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation Paper • 2404.13026 • Published Apr 19 • 21
On the Scalability of Diffusion-based Text-to-Image Generation Paper • 2404.02883 • Published Apr 3 • 17
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models Paper • 2404.01367 • Published Apr 1 • 19
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30 • 40
Are We on the Right Way for Evaluating Large Vision-Language Models? Paper • 2403.20330 • Published Mar 29 • 6
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 123
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings Paper • 2403.07750 • Published Mar 12 • 19
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 179
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20 • 19
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices Paper • 2311.16567 • Published Nov 28, 2023 • 21
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 47
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 84
VideoPoet: A Large Language Model for Zero-Shot Video Generation Paper • 2312.14125 • Published Dec 21, 2023 • 41
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis Paper • 2312.13834 • Published Dec 20, 2023 • 26
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 66
StarVector: Generating Scalable Vector Graphics Code from Images Paper • 2312.11556 • Published Dec 17, 2023 • 26
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing Paper • 2312.07409 • Published Dec 12, 2023 • 22
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations Paper • 2312.04655 • Published Dec 7, 2023 • 19
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior Paper • 2312.06655 • Published Dec 11, 2023 • 21
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models Paper • 2312.06585 • Published Dec 11, 2023 • 26
Evaluation of Large Language Models for Decision Making in Autonomous Driving Paper • 2312.06351 • Published Dec 11, 2023 • 5
Photorealistic Video Generation with Diffusion Models Paper • 2312.06662 • Published Dec 11, 2023 • 23
Efficient Quantization Strategies for Latent Diffusion Models Paper • 2312.05431 • Published Dec 9, 2023 • 11
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models Paper • 2312.05107 • Published Dec 8, 2023 • 35
Analyzing and Improving the Training Dynamics of Diffusion Models Paper • 2312.02696 • Published Dec 5, 2023 • 31
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published Nov 22, 2023 • 25
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs Paper • 2311.09257 • Published Nov 14, 2023 • 43
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper • 2311.06783 • Published Nov 12, 2023 • 25
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving Paper • 2311.05332 • Published Nov 9, 2023 • 7
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 76
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper • 2311.04145 • Published Nov 7, 2023 • 31
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 39
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design Paper • 2310.15144 • Published Oct 23, 2023 • 12
DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics Paper • 2310.13268 • Published Oct 20, 2023 • 15
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation Paper • 2310.08541 • Published Oct 12, 2023 • 17
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5, 2023 • 75
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Paper • 2310.00426 • Published Sep 30, 2023 • 60