mdouglas
's Collections
Papers: Models
updated
Llemma: An Open Language Model For Mathematics
Paper
•
2310.10631
•
Published
•
50
Paper
•
2310.06825
•
Published
•
47
Paper
•
2309.16609
•
Published
•
35
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
•
2309.11568
•
Published
•
10
Textbooks Are All You Need II: phi-1.5 technical report
Paper
•
2309.05463
•
Published
•
87
Paper
•
2309.03450
•
Published
•
8
Code Llama: Open Foundation Models for Code
Paper
•
2308.12950
•
Published
•
24
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
243
LLaMA: Open and Efficient Foundation Language Models
Paper
•
2302.13971
•
Published
•
13
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper
•
2211.05100
•
Published
•
27
Scaling Instruction-Finetuned Language Models
Paper
•
2210.11416
•
Published
•
7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
•
1910.01108
•
Published
•
14
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
•
1910.10683
•
Published
•
10
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper
•
1907.11692
•
Published
•
7
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
•
1810.04805
•
Published
•
16
Skywork: A More Open Bilingual Foundation Model
Paper
•
2310.19341
•
Published
•
5
SkyMath: Technical Report
Paper
•
2310.16713
•
Published
•
2
LaMDA: Language Models for Dialog Applications
Paper
•
2201.08239
•
Published
•
4
Sheared LLaMA: Accelerating Language Model Pre-training via Structured
Pruning
Paper
•
2310.06694
•
Published
•
4
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Paper
•
2311.08552
•
Published
•
7
TinyLlama: An Open-Source Small Language Model
Paper
•
2401.02385
•
Published
•
89
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
•
2401.02954
•
Published
•
41
Paper
•
2401.04088
•
Published
•
158
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
•
2401.04081
•
Published
•
70
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
H2O-Danube-1.8B Technical Report
Paper
•
2401.16818
•
Published
•
17
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model
Paper
•
2402.07827
•
Published
•
45