Lifelong Language Pretraining with Distribution-Specialized Experts Paper • 2305.12281 • Published May 20, 2023 • 1
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models Paper • 2305.14705 • Published May 24, 2023
Principled Architecture-aware Scaling of Hyperparameters Paper • 2402.17440 • Published Feb 27, 2024
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation Paper • 2112.09747 • Published Dec 17, 2021
PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs Paper • 2502.00963 • Published 16 days ago • 16
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published Dec 18, 2024 • 19