DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 5 days ago • 52
MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published 30 days ago • 19
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Paper • 2405.20335 • Published 30 days ago • 17
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9 • 18
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 54
RewardBench: Evaluating Reward Models for Language Modeling Paper • 2403.13787 • Published Mar 20 • 19
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11 • 52
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8 • 39
Common 7B Language Models Already Possess Strong Math Capabilities Paper • 2403.04706 • Published Mar 7 • 16
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 66
LongAlign: A Recipe for Long Context Alignment of Large Language Models Paper • 2401.18058 • Published Jan 31 • 21
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens Paper • 2401.17377 • Published Jan 30 • 32
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 55
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29 • 46
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence Paper • 2401.14196 • Published Jan 25 • 45
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Paper • 2401.05566 • Published Jan 10 • 24
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 36
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5 • 38
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Paper • 2312.09390 • Published Dec 14, 2023 • 32