MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 124
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model Paper • 2402.03766 • Published Feb 6 • 12
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29 • 14
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 104
FABLES: Evaluating faithfulness and content selection in book-length summarization Paper • 2404.01261 • Published Apr 1 • 3
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints Paper • 2305.13245 • Published May 22, 2023 • 5
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 126
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores Paper • 2311.05908 • Published Nov 10, 2023 • 12
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases Paper • 2404.13207 • Published Apr 19
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 118
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 119
You Only Cache Once: Decoder-Decoder Architectures for Language Models Paper • 2405.05254 • Published May 8 • 10
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model Paper • 2405.09215 • Published May 15 • 18
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published May 16 • 26
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 46
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21 • 28
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23 • 27
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published Apr 30 • 22
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published May 30 • 29
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images Paper • 2403.11703 • Published Mar 18 • 16
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 63
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published Jun 11 • 37
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus Paper • 2406.08707 • Published Jun 13 • 15
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published Jun 21 • 62
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 86
Preference Tuning For Toxicity Mitigation Generalizes Across Languages Paper • 2406.16235 • Published Jun 23 • 11
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 87
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models Paper • 2406.16838 • Published Jun 24 • 2
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 93
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5 • 27
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8 • 34
Lost in the Middle: How Language Models Use Long Contexts Paper • 2307.03172 • Published Jul 6, 2023 • 37
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation Paper • 2407.07093 • Published Jul 9 • 1
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published Jul 15 • 22
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17 • 33
On the Limitations of Compute Thresholds as a Governance Strategy Paper • 2407.05694 • Published Jul 8 • 2
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 75
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 56
Medical SAM 2: Segment medical images as video via Segment Anything Model 2 Paper • 2408.00874 • Published Aug 1 • 45
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 155
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5 • 35
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 41
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 57
Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published Aug 22 • 63
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 123
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27 • 37
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27 • 52
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 138
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29 • 47
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published Sep 4 • 28
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published Sep 9 • 45
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery Paper • 2409.05591 • Published Sep 9 • 29
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Paper • 2409.02889 • Published Sep 4 • 55
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published Sep 16 • 39
One missing piece in Vision and Language: A Survey on Comics Understanding Paper • 2409.09502 • Published Sep 14 • 23
A Controlled Study on Long Context Extension and Generalization in LLMs Paper • 2409.12181 • Published Sep 18 • 43
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 74
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 103
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices Paper • 2408.09169 • Published Aug 17 • 1
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely Paper • 2409.14924 • Published Sep 23 • 1
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4 • 36
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models Paper • 2410.05229 • Published Oct 7 • 21
nGPT: Normalized Transformer with Representation Learning on the Hypersphere Paper • 2410.01131 • Published Oct 1 • 9
HelpSteer2-Preference: Complementing Ratings with Preferences Paper • 2410.01257 • Published Oct 2 • 21
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Paper • 2006.16236 • Published Jun 29, 2020 • 3
FlexAttention for Efficient High-Resolution Vision-Language Models Paper • 2407.20228 • Published Jul 29 • 1
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20 • 12
LBPE: Long-token-first Tokenization to Improve Large Language Models Paper • 2411.05504 • Published Nov 8 • 1
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7 • 49
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7 • 37
Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12 • 62
GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 1
Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation Paper • 2409.11860 • Published Sep 18 • 1
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 25 days ago • 38
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published 24 days ago • 41
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published 23 days ago • 55
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11 • 32
BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once Paper • 2405.12971 • Published May 21 • 2
Agent Skill Acquisition for Large Language Models via CycleQD Paper • 2410.14735 • Published Oct 16 • 2
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published 16 days ago • 50
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving Paper • 2407.00079 • Published Jun 24 • 5
Transformers Can Navigate Mazes With Multi-Step Prediction Paper • 2412.05117 • Published 9 days ago • 5