Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates Paper • 2206.00832 • Published Jun 2, 2022
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Paper • 2206.14098 • Published Jun 28, 2022
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models Paper • 2303.10464 • Published Mar 18, 2023 • 1
Sparse Iso-FLOP Transformations for Maximizing Training Efficiency Paper • 2303.11525 • Published Mar 21, 2023 • 1
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Paper • 2405.03594 • Published May 6 • 7
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms Paper • 2311.13133 • Published Nov 22, 2023
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Paper • 2312.17482 • Published Dec 29, 2023 • 1
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images Paper • 2310.16825 • Published Oct 25, 2023 • 32