Spaces:
Sleeping
Sleeping
Apply for community grant: Academic project (gpu)
#1
by
yuhuili
- opened
We introduce EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), a new baseline for fast decoding of Large Language Models (LLMs) with provable performance maintenance. This approach involves extrapolating the second-top-layer contextual feature vectors of LLMs, enabling a significant boost in generation efficiency.
- EAGLE is:
- 3x faster than vanilla decoding (13B).
- 2x faster than Lookahead (13B).
- 1.6x faster than Medusa (13B).
- provably maintaining the consistency with vanilla decoding in the distribution of generated texts.
- trainable (within 1-2 days) and testable on 8x RTX 3090 GPUs. So even the GPU poor can afford it. - combinable with other parallelled techniques such as vLLM, Mamba, FlashAttention, quantization, and hardware optimization.
Hi @yuhuili , we have assigned a gpu to this space. Note that GPU Grants are provided temporarily and might be removed after some time if the usage is very low.
To learn more about GPUs in Spaces, please check out https://huggingface.co/docs/hub/spaces-gpus
Thank you so much for assigning a GPU! I really appreciate the support.