Resonance RoPE: Improving Context Length Generalization of Large Language Models Paper • 2403.00071 • Published Feb 29 • 22
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Paper • 2404.12387 • Published Apr 18 • 38
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 126
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation Paper • 2404.07129 • Published Apr 10 • 3
Round and Round We Go! What makes Rotary Positional Encodings useful? Paper • 2410.06205 • Published Oct 8 • 1