Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published 19 days ago • 136 • 15
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published 18 days ago • 12 • 2
Training Task Experts through Retrieval Based Distillation Paper • 2407.05463 • Published Jul 7 • 6 • 1