Nemotron models that have been converted and/or quantized to work well in vLLM
Michael Goin PRO
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Recent Activity
updated
a model
3 days ago
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic
updated
a model
3 days ago
neuralmagic/Sparse-Llama-3.1-8B-evolcodealpaca-2of4-FP8-dynamic
updated
a model
3 days ago
neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4-FP8-dynamic
Organizations
Collections
1
spaces
4
models
73
mgoin/Pixtral-Large-Instruct-2411
Updated
mgoin/Qwen2.5-Coder-32B-Instruct-fp8
Updated
mgoin/nemotron-3-8b-chat-4k-sft-hf
Text Generation
•
Updated
•
64
mgoin/llava-onevision-qwen2-7b-ov-hf-bnb-full-4bit
Image-Text-to-Text
•
Updated
•
50
mgoin/MiniCPM-Llama3-V-2_5-int4
Visual Question Answering
•
Updated
•
7
mgoin/pixtral-12b
Image-Text-to-Text
•
Updated
•
497
mgoin/DeepSeek-Coder-V2-Lite-Instruct-FP8
Updated
•
4.07k
mgoin/Mixtral-8x7B-Instruct-v0.1-FP8
Updated
•
4
mgoin/Nemotron-nemo-checkpoints
Updated
mgoin/Minitron-4B-Base-FP8
Text Generation
•
Updated
•
927
•
3