MrT5: Dynamic Token Merging for Efficient Byte-level Language Models Paper • 2410.20771 • Published Oct 28, 2024 • 3
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 23 days ago • 80