-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 36 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 54 -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper • 2406.14544 • Published • 33 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 74
Collections
Discover the best community collections!
Collections including paper arxiv:2406.08464
-
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 24 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 58 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 46 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 46
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 62 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 110 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 52 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 77
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 80 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 13 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 24 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 23
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 105 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 39 -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 24
-
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 75 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 102 -
ReFT: Representation Finetuning for Language Models
Paper • 2404.03592 • Published • 74 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 58
-
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 46 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 27 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 12 -
Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets
Paper • 2405.18952 • Published • 7