Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper β’ 2404.14219 β’ Published Apr 22 β’ 254
view post Post 2160 Reply Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support! Contributions are welcome to support more models! π₯ π₯ 13 13 β€οΈ 4 4 π€― 3 3 π€ 3 3 +
Small-scale proxies for large-scale Transformer training instabilities Paper β’ 2309.14322 β’ Published Sep 25, 2023 β’ 19
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Paper β’ 2201.02177 β’ Published Jan 6, 2022 β’ 2
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14 β’ 50
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper β’ 2405.20233 β’ Published May 30 β’ 6
Transformer Explainer: Interactive Learning of Text-Generative Models Paper β’ 2408.04619 β’ Published Aug 8 β’ 155