NanoFlow: Towards Optimal Large Language Model Serving Throughput Paper • 2408.12757 • Published Aug 22, 2024 • 17
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Paper • 2402.07033 • Published Feb 10, 2024 • 16
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence Paper • 2306.07075 • Published Jun 12, 2023 • 10