-
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 42 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 6 -
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper • 2403.09919 • Published • 20 -
On Speculative Decoding for Multimodal Large Language Models
Paper • 2404.08856 • Published • 13
Linkun
hugg1ngfac3
·
AI & ML interests
None yet
Organizations
None yet
Collections
4
models
None public yet
datasets
None public yet