Collections
Discover the best community collections!
Collections including paper arxiv:2401.02385
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 12 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 45 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 14 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 60
-
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 47 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 50 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 28
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 81 -
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper • 2401.01335 • Published • 61 -
Asynchronous Local-SGD Training for Language Modeling
Paper • 2401.09135 • Published • 9 -
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Paper • 2404.07143 • Published • 97
-
Mixtral of Experts
Paper • 2401.04088 • Published • 154 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 68 -
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 81 -
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper • 2401.02415 • Published • 50
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 81 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 41 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 63 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 46
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 38 -
Qwen Technical Report
Paper • 2309.16609 • Published • 30 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 3 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 44