Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published 9 days ago • 19
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published 9 days ago • 21
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published 5 days ago • 44
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published 5 days ago • 15
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4 • 35
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation Paper • 2310.03214 • Published Oct 5, 2023 • 14
Meta-Transformer: A Unified Framework for Multimodal Learning Paper • 2307.10802 • Published Jul 20, 2023 • 40