Organization Card

The Future of AI is Open

If you are looking for compressed models to run with vLLM, they have been moved to the RedHatAI organization. We are looking forward to continue publishing optimized models for open source use!

Neural Magic helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.

vLLM: A high-throughput and memory-efficient inference engine for at-scale deployment of performant open-source LLMs
LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM

In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.

Collections 14

spaces 2

Running

3

Quant Llms Text Generation

🔥

Quantized vs. Unquantized LLM: Text Generation Comparison

Paused

16

Sparse Llama Gsm8k

📚

Solve math problems with chat-based guidance

models

None public yet

datasets 13

Neural Magic

AI & ML interests

Recent Activity

The Future of AI is Open

Collections 14

RedHatAI/DeepSeek-R1-Distill-Llama-8B-FP8-dynamic

RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic

RedHatAI/DeepSeek-R1-Distill-Qwen-32B-FP8-dynamic

RedHatAI/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8

RedHatAI/granite-3.1-2b-instruct-quantized.w4a16

RedHatAI/granite-3.1-2b-instruct-quantized.w8a8

RedHatAI/granite-3.1-8b-instruct-quantized.w4a16

RedHatAI/granite-3.1-8b-instruct-quantized.w8a8

spaces 2

Quant Llms Text Generation

Sparse Llama Gsm8k

models

datasets 13

neuralmagic/calibration

neuralmagic/mmlu_it

neuralmagic/mmlu_fr

neuralmagic/mmlu_th

neuralmagic/mmlu_de

neuralmagic/mmlu_es

neuralmagic/mmlu_hi

neuralmagic/mmlu_pt

neuralmagic/quantized-llama-3.1-leaderboard-v2-evals

neuralmagic/quantized-llama-3.1-humaneval-evals

AI & ML interests

Recent Activity

Team members 38

The Future of AI is Open

Collections 14

spaces 2 Sort: Recently updated

Quant Llms Text Generation

Sparse Llama Gsm8k

models

datasets 13 Sort: Recently updated

spaces 2

datasets 13