Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 26 days ago • 142
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated 16 days ago • 106
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 8 items • Updated 18 days ago • 396
HIGGS Collection Models prequantized with [HIGGS](https://arxiv.org/abs/2411.17525) zero-shot quantization. Requires the latest `transformers` to run. • 18 items • Updated 13 days ago • 6
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 292
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 16 days ago • 560
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13, 2024 • 66
FP8 LLMs for vLLM Collection Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 44 items • Updated Oct 17, 2024 • 67
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1, 2024 • 73
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 256
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement Paper • 2402.14658 • Published Feb 22, 2024 • 82