Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.19752

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 17 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1 • 20
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1 • 75
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30 • 22

about 1 hour ago

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 11
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 4

image llm works

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Paper • 2404.19752 • Published Apr 30 • 20
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25 • 49
MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12 • 73
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 123

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 69
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

Paper • 2404.19759 • Published Apr 30 • 24
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Paper • 2404.19752 • Published Apr 30 • 20
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting

Paper • 2404.19758 • Published Apr 30 • 10

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25 • 24
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 29
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1 • 19

To read... eventually

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 123
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 47
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 9
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 63

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 28
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Paper • 2312.17172 • Published Dec 28, 2023 • 25
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Paper • 2401.01974 • Published Jan 3 • 4
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3 • 26

Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

Paper • 2312.09608 • Published Dec 15, 2023 • 13
CodeFusion: A Pre-trained Diffusion Model for Code Generation

Paper • 2310.17680 • Published Oct 26, 2023 • 68
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

Paper • 2310.17994 • Published Oct 27, 2023 • 7
Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss

Paper • 2401.02677 • Published Jan 5 • 21

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs