Hydragen: High-Throughput LLM Inference with Shared Prefixes Paper • 2402.05099 • Published Feb 7 • 18
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs Paper • 2410.16144 • Published Oct 21 • 2