new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Jun 13

Submitted by

yulunliu

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

·
6 authors

Submitted by

akhaliq

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

·
7 authors

Submitted by

myownskyW7

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

·
9 authors

Submitted by

akhaliq

What If We Recaption Billions of Web Images with LLaMA-3?

·
12 authors

Submitted by

robmchinst

Are We Done with MMLU?

·
16 authors

Submitted by

yixinsong

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

·
6 authors

Submitted by

Liuff23

Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

·
6 authors

Submitted by

lixin4ever

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

·
11 authors

Submitted by

jedyang97

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

·
7 authors

Submitted by

akhaliq

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

·
14 authors

Submitted by

yixinsong

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

·
7 authors

Submitted by

GlyphByT5

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

·
8 authors

Submitted by

akhaliq

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

·
4 authors

Submitted by

akhaliq

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

·
5 authors

Submitted by

chrlu

Discovering Preference Optimization Algorithms with and for Large Language Models

·
7 authors

Submitted by

sheryc

VCR: Visual Caption Restoration

·
9 authors

Submitted by

yifanzhang114

Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

·
7 authors

Submitted by

AliBehrouz

Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models

·
3 authors

Submitted by

chrisliu298

Large Language Model Unlearning via Embedding-Corrupted Prompts

·
4 authors

Submitted by

mgvz

Hibou: A Family of Foundational Vision Transformers for Pathology

·
3 authors

Submitted by

thjashin

Simplified and Generalized Masked Diffusion for Discrete Data

·
5 authors