Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published Nov 20, 2024 • 40 • 3
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Paper • 2410.01680 • Published Oct 2, 2024 • 33 • 4
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21, 2024 • 57 • 4