hllj
's Collections
Speculative Decoding
updated
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Paper
•
2404.18911
•
Published
•
29
Accelerating LLM Inference with Staged Speculative Decoding
Paper
•
2308.04623
•
Published
•
23
An Emulator for Fine-Tuning Large Language Models using Small Language
Models
Paper
•
2310.12962
•
Published
•
14
The Curious Case of Neural Text Degeneration
Paper
•
1904.09751
•
Published
•
3
On Speculative Decoding for Multimodal Large Language Models
Paper
•
2404.08856
•
Published
•
13
TriForce: Lossless Acceleration of Long Sequence Generation with
Hierarchical Speculative Decoding
Paper
•
2404.11912
•
Published
•
16
SpecInfer: Accelerating Generative LLM Serving with Speculative
Inference and Token Tree Verification
Paper
•
2305.09781
•
Published
•
4
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper
•
2404.16710
•
Published
•
75
Better & Faster Large Language Models via Multi-token Prediction
Paper
•
2404.19737
•
Published
•
73
Multi-Candidate Speculative Decoding
Paper
•
2401.06706
•
Published
•
1
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative
Decoding
Paper
•
2402.02082
•
Published
•
1
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
Paper
•
2402.05109
•
Published