MotionBooth: Motion-Aware Customized Text-to-Video Generation Paper • 2406.17758 • Published about 23 hours ago • 10
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models Paper • 2406.16863 • Published 2 days ago • 6
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 1 day ago • 40
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models Paper • 2406.15704 • Published 5 days ago • 4
IRASim: Learning Interactive Real-Robot Action Simulators Paper • 2406.14540 • Published 6 days ago • 4
ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians Paper • 2406.16815 • Published 2 days ago • 5
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published 4 days ago • 35
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published 2 days ago • 20
LongVA Collection Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/ • 5 items • Updated about 8 hours ago • 8
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 2 days ago • 47
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published 5 days ago • 12
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published 7 days ago • 12
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published 5 days ago • 46
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Paper • 2406.13542 • Published 7 days ago • 14
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning Paper • 2406.14130 • Published 6 days ago • 10
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published 6 days ago • 63
HARE: HumAn pRiors, a key to small language model Efficiency Paper • 2406.11410 • Published 9 days ago • 35
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks Paper • 2406.12066 • Published 9 days ago • 7
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Paper • 2406.06523 • Published 16 days ago • 47
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published 6 days ago • 27
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation Paper • 2406.12849 • Published 8 days ago • 45
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published 8 days ago • 14
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published 8 days ago • 26
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published 8 days ago • 28
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published 12 days ago • 34
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published 9 days ago • 53
Pandora: Towards General World Model with Natural Language Actions and Video States Paper • 2406.09455 • Published 14 days ago • 12
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Paper • 2406.11069 • Published 10 days ago • 11
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published 9 days ago • 55
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers Paper • 2406.10163 • Published 12 days ago • 23
mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published 9 days ago • 34
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Paper • 2406.08587 • Published 14 days ago • 14
view article Article The CVPR Survival Guide: Discovering Research That's Interesting to YOU! By harpreetsahota • 12 days ago • 9
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 15 days ago • 29
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published 24 days ago • 17
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 15 days ago • 52
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 16 days ago • 60
Vript Collection A large-scale video-text dataset of high-resolution videos annotated with dense and detailed captions. • 9 items • Updated 3 days ago • 2
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published 19 days ago • 23
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published 19 days ago • 49
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published 20 days ago • 19
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 20 days ago • 69
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published 20 days ago • 26
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 29 items • Updated 20 days ago • 207
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion Paper • 2406.03184 • Published 21 days ago • 18
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception Paper • 2401.16158 • Published Jan 29 • 16
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration Paper • 2406.01014 • Published 23 days ago • 29