MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use Paper • 2502.15872 • Published 17 days ago • 4
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published 14 days ago • 16
Slam Collection All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets • 6 items • Updated 13 days ago • 13
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Paper • 2502.19261 • Published 12 days ago • 6
IntelLabs/sqft-qa-sparsepeft-mistral-7b-v0.3-50-gptq-math-heu Text Generation • Updated 26 days ago • 158 • 3
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? Paper • 2502.11895 • Published 21 days ago • 1
Hamanasu Collection A brand new series of Models from yours truly, Designed for Intelligence, Creativity and Roleplay. • 13 items • Updated 3 days ago • 4
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization Paper • 2502.02631 • Published Feb 4 • 2
Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales Paper • 2502.01908 • Published Feb 4 • 1
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published about 1 month ago • 42