BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published 7 days ago • 39 • 8
Evaluating D-MERIT of Partial-annotation on Information Retrieval Paper • 2406.16048 • Published 6 days ago • 33 • 2
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 5 days ago • 52 • 3
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published 23 days ago • 36 • 3
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published 23 days ago • 26 • 2
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 23 days ago • 69 • 4
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Paper • 2406.00392 • Published 28 days ago • 12 • 1
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation Paper • 2406.00908 • Published 26 days ago • 11 • 1
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback Paper • 2406.00888 • Published 26 days ago • 29 • 1
Learning Temporally Consistent Video Depth from Video Diffusion Priors Paper • 2406.01493 • Published 26 days ago • 17 • 2
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published 26 days ago • 42 • 3
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published May 28 • 18 • 2
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28 • 17 • 3
Part123: Part-aware 3D Reconstruction from a Single-view Image Paper • 2405.16888 • Published May 27 • 10 • 1
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27 • 49 • 2
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43 • 7
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23 • 28 • 3
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published May 23 • 15 • 9
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published May 22 • 22 • 5
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53 • 8
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 44 • 10
Chameleon: Mixed-Modal Early-Fusion Foundation Models Paper • 2405.09818 • Published May 16 • 110 • 11
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published May 15 • 14 • 1
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published May 15 • 23 • 1
Compositional Text-to-Image Generation with Dense Blob Representations Paper • 2405.08246 • Published May 14 • 11 • 1
SUTRA: Scalable Multilingual Language Model Architecture Paper • 2405.06694 • Published May 7 • 35 • 2
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116 • 9
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 106 • 11
STT: Stateful Tracking with Transformers for Autonomous Driving Paper • 2405.00236 • Published Apr 30 • 7 • 2
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published Apr 28 • 25 • 5
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Paper • 2404.16820 • Published Apr 25 • 15 • 2
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 56 • 9
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25 • 49 • 5
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 124 • 14
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38 • 9
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 240 • 41
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting Paper • 2404.06903 • Published Apr 10 • 14 • 3
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 123 • 11
StableDrag: Stable Dragging for Point-based Image Editing Paper • 2403.04437 • Published Mar 7 • 24 • 4
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 30 • 3
CodePlan: Repository-level Coding using LLMs and Planning Paper • 2309.12499 • Published Sep 21, 2023 • 69 • 14
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering Paper • 2311.12775 • Published Nov 21, 2023 • 28 • 3
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation Paper • 2311.07562 • Published Nov 13, 2023 • 11 • 1
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation Paper • 2311.01455 • Published Nov 2, 2023 • 25 • 2
HyperFields: Towards Zero-Shot Generation of NeRFs from Text Paper • 2310.17075 • Published Oct 26, 2023 • 13 • 2
3D-GPT: Procedural 3D Modeling with Large Language Models Paper • 2310.12945 • Published Oct 19, 2023 • 52 • 2
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Paper • 2310.11511 • Published Oct 17, 2023 • 65 • 5