llm_inference - a tuyenTS Collection

tuyenTS 's Collections

multi-modalities

llms

voice

llm_compression

llm_explanation

llm_inference

updated Feb 21, 2024

FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 37
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Paper • 2401.02669 • Published Jan 5, 2024 • 16
Speculative Streaming: Fast LLM Inference without Auxiliary Models

Paper • 2402.11131 • Published Feb 16, 2024 • 43