mdouglas
's Collections
Detecting Pretraining Data from Large Language Models
Paper
•
2310.16789
•
Published
•
10
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
Language Models by Extrapolating Errors from Small Models
Paper
•
2310.13671
•
Published
•
18
AutoMix: Automatically Mixing Language Models
Paper
•
2310.12963
•
Published
•
14
An Emulator for Fine-Tuning Large Language Models using Small Language
Models
Paper
•
2310.12962
•
Published
•
14
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper
•
2310.10638
•
Published
•
29
Zephyr: Direct Distillation of LM Alignment
Paper
•
2310.16944
•
Published
•
123
Reward-Augmented Decoding: Efficient Controlled Text Generation With a
Unidirectional Reward Model
Paper
•
2310.09520
•
Published
•
10
DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
Paper
•
2310.03714
•
Published
•
31
Efficient Streaming Language Models with Attention Sinks
Paper
•
2309.17453
•
Published
•
13
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
•
2309.12284
•
Published
•
18
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
•
2309.11495
•
Published
•
38
Knowledge Distillation of Large Language Models
Paper
•
2306.08543
•
Published
•
20
A Repository of Conversational Datasets
Paper
•
1904.06472
•
Published
•
3
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
Paper
•
1704.05179
•
Published
•
1
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
4
Efficient Few-Shot Learning Without Prompts
Paper
•
2209.11055
•
Published
•
3
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
49
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper
•
2005.11401
•
Published
•
11
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-Awareness
Paper
•
2205.14135
•
Published
•
11
Textbooks Are All You Need
Paper
•
2306.11644
•
Published
•
142
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
•
2305.18290
•
Published
•
49
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122
Language Variants
Paper
•
2308.16884
•
Published
•
8
Retentive Network: A Successor to Transformer for Large Language Models
Paper
•
2307.08621
•
Published
•
170
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper
•
2310.17752
•
Published
•
12
Contrastive Decoding: Open-ended Text Generation as Optimization
Paper
•
2210.15097
•
Published
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
•
2309.09117
•
Published
•
37
Efficient Memory Management for Large Language Model Serving with
PagedAttention
Paper
•
2309.06180
•
Published
•
25
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models
Paper
•
2309.03883
•
Published
•
34
Controlled Decoding from Language Models
Paper
•
2310.17022
•
Published
•
14
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper
•
2309.08968
•
Published
•
22
Learning From Mistakes Makes LLM Better Reasoner
Paper
•
2310.20689
•
Published
•
28
Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection
Paper
•
2310.11511
•
Published
•
75
YaRN: Efficient Context Window Extension of Large Language Models
Paper
•
2309.00071
•
Published
•
65
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like
Models at All Scales
Paper
•
2308.01320
•
Published
•
44
Shepherd: A Critic for Language Model Generation
Paper
•
2308.04592
•
Published
•
31
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
Checkpoints
Paper
•
2305.13245
•
Published
•
5
Improving Large Language Model Fine-tuning for Solving Math Problems
Paper
•
2310.10047
•
Published
•
5
Dialogue Act Classification with Context-Aware Self-Attention
Paper
•
1904.02594
•
Published
It's Morphin' Time! Combating Linguistic Discrimination with
Inflectional Perturbations
Paper
•
2005.04364
•
Published
Question rewriting? Assessing its importance for conversational question
answering
Paper
•
2201.09146
•
Published
Can Question Rewriting Help Conversational Question Answering?
Paper
•
2204.06239
•
Published
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
Labelling
Paper
•
2311.00430
•
Published
•
57
Paper
•
2310.20707
•
Published
•
10
When Less is More: Investigating Data Pruning for Pretraining LLMs at
Scale
Paper
•
2309.04564
•
Published
•
15
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper
•
2311.01282
•
Published
•
35
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper
•
2311.06243
•
Published
•
17
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper
•
2311.04934
•
Published
•
28
Co-training and Co-distillation for Quality Improvement and Compression
of Language Models
Paper
•
2311.02849
•
Published
•
3
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Paper
•
2310.19820
•
Published
•
1
LLM in a flash: Efficient Large Language Model Inference with Limited
Memory
Paper
•
2312.11514
•
Published
•
257
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
•
2404.07839
•
Published
•
43