PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models Paper • 2402.08714 • Published Feb 13 • 10
Data Engineering for Scaling Language Models to 128K Context Paper • 2402.10171 • Published Feb 15 • 18
RLVF: Learning from Verbal Feedback without Overgeneralization Paper • 2402.10893 • Published Feb 16 • 10
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement Paper • 2402.14658 • Published Feb 22 • 78
TinyLLaVA: A Framework of Small-scale Large Multimodal Models Paper • 2402.14289 • Published Feb 22 • 17
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 106
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Paper • 2402.16822 • Published Feb 26 • 15
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 46
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters Paper • 2403.02677 • Published Mar 5 • 16
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs Paper • 2402.11753 • Published Feb 19 • 5
How Far Are We from Intelligent Visual Deductive Reasoning? Paper • 2403.04732 • Published Mar 7 • 18
Evaluating and Mitigating Discrimination in Language Model Decisions Paper • 2312.03689 • Published Dec 6, 2023 • 1
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits Paper • 2305.02547 • Published May 4, 2023 • 7
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History Paper • 2402.18216 • Published Feb 28 • 1
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Paper • 2403.16990 • Published Mar 25 • 24
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29 • 14
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 22
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 61
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Paper • 2404.05726 • Published Apr 8 • 18
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 97
Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations Paper • 2404.07153 • Published Apr 10 • 1
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 46
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 28
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15 • 20
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published Apr 19 • 28
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19 • 27
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published Apr 22 • 17
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper • 2404.14047 • Published Apr 22 • 38
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 69
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116
Observational Scaling Laws and the Predictability of Language Model Performance Paper • 2405.10938 • Published May 17 • 10
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published May 20 • 25
LANISTR: Multimodal Learning from Structured and Unstructured Data Paper • 2305.16556 • Published May 26, 2023 • 2
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Paper • 2405.11273 • Published May 18 • 17
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published May 24 • 23
MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published 30 days ago • 19
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Paper • 2405.20335 • Published 30 days ago • 17
LLMs achieve adult human performance on higher-order theory of mind tasks Paper • 2405.18870 • Published about 1 month ago • 15
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback Paper • 2406.00888 • Published 26 days ago • 29
Guiding a Diffusion Model with a Bad Version of Itself Paper • 2406.02507 • Published 25 days ago • 14
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM Paper • 2406.02884 • Published 24 days ago • 13
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published 25 days ago • 35
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 18 days ago • 30
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published 17 days ago • 23
Large Language Model Unlearning via Embedding-Corrupted Prompts Paper • 2406.07933 • Published 17 days ago • 6
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published 16 days ago • 47
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published 15 days ago • 74
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper • 2406.08973 • Published 16 days ago • 85
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published 12 days ago • 60
mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published 12 days ago • 35
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published 12 days ago • 28
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published 17 days ago • 46
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation Paper • 2406.12849 • Published 11 days ago • 48
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts Paper • 2406.12034 • Published 12 days ago • 12
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published 14 days ago • 64
Fantastic Copyrighted Beasts and How (Not) to Generate Them Paper • 2406.14526 • Published 9 days ago • 1
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published 11 days ago • 34
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems Paper • 2406.14972 • Published 8 days ago • 6
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published 10 days ago • 12
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers Paper • 2406.12430 • Published 11 days ago • 1
Weight subcloning: direct initialization of transformers using larger pretrained ones Paper • 2312.09299 • Published Dec 14, 2023 • 17
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA Paper • 2406.17419 • Published 4 days ago • 12
Large Language Models Assume People are More Rational than We Really are Paper • 2406.17055 • Published 5 days ago • 4
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 5 days ago • 52
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published 5 days ago • 22
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Paper • 2406.18495 • Published 3 days ago • 10