Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19 • 54
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying Paper • 2311.09578 • Published Nov 16, 2023 • 14