DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers Paper • 2403.10266 • Published Mar 15, 2024
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published Aug 22, 2024 • 16
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices Paper • 2403.01164 • Published Mar 2, 2024