Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens Paper • 2411.17691 • Published about 1 month ago • 9 • 4
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model Paper • 2411.04496 • Published Nov 7 • 22 • 3
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments Paper • 2410.23918 • Published Oct 31 • 18 • 6
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments Paper • 2410.23918 • Published Oct 31 • 18 • 6
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise Paper • 2410.03017 • Published Oct 3 • 26 • 5
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1 • 29 • 6
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11 • 43 • 2