Collections
Discover the best community collections!
Collections including paper arxiv:2405.00732
-
Better & Faster Large Language Models via Multi-token Prediction
Paper • 2404.19737 • Published • 73 -
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
Paper • 2405.00664 • Published • 18 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 118
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 83 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 15 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 24 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 25
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 592 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 103 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 43
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 83 -
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper • 2404.05961 • Published • 63 -
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 27 -
Multi-Head Mixture-of-Experts
Paper • 2404.15045 • Published • 59
-
Communicative Agents for Software Development
Paper • 2307.07924 • Published • 2 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper • 2303.17651 • Published • 2 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 34 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 14
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 103 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 38 -
ViTAR: Vision Transformer with Any Resolution
Paper • 2403.18361 • Published • 51 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 44