-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 8 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 51 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 18 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 32
rubbyninja
rubbyninja
AI & ML interests
None yet
Recent Activity
updated
a collection
2 days ago
advancing research
upvoted
a
paper
2 days ago
s1: Simple test-time scaling
upvoted
a
paper
12 days ago
Better & Faster Large Language Models via Multi-token Prediction
Organizations
None yet
Collections
1
models
None public yet
datasets
None public yet