Parallelizing Linear Transformers with the Delta Rule over Sequence Length Paper • 2406.06484 • Published Jun 10 • 3