World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 35
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 77
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15 • 28
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1 • 29
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9 • 29
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies Paper • 2404.08197 • Published Apr 12 • 26
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published May 16 • 25
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 44
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20 • 33
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24 • 43
LLMs achieve adult human performance on higher-order theory of mind tasks Paper • 2405.18870 • Published May 29 • 15
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published 23 days ago • 26
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 19 days ago • 60
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published 22 days ago • 49
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published 23 days ago • 19
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published 17 days ago • 38
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 18 days ago • 30
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published 17 days ago • 28
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published 12 days ago • 39
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper • 2406.11271 • Published 12 days ago • 10
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published 9 days ago • 74
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials Paper • 2406.14347 • Published 9 days ago • 96
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published 9 days ago • 27
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 4 days ago • 66
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published 3 days ago • 27