-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 123 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 1 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 13 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 67
Collections
Discover the best community collections!
Collections including paper arxiv:2203.02155
-
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 15 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 45 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 13 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 9
-
Attention Is All You Need
Paper • 1706.03762 • Published • 43 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 11 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 241
-
Attention Is All You Need
Paper • 1706.03762 • Published • 43 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 11
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 4 -
Attention Is All You Need
Paper • 1706.03762 • Published • 43 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 11 -
Language Model Evaluation Beyond Perplexity
Paper • 2106.00085 • Published
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 3 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 45 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 142 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 15
-
Attention Is All You Need
Paper • 1706.03762 • Published • 43 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14
-
Attention Is All You Need
Paper • 1706.03762 • Published • 43 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 11 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 9 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 70