Michael Goin's picture

Michael Goin PRO

mgoin

·

mgoin_
mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Recent Activity

updated a model 3 days ago

neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic

updated a model 3 days ago

neuralmagic/Sparse-Llama-3.1-8B-evolcodealpaca-2of4-FP8-dynamic

updated a model 3 days ago

neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4-FP8-dynamic

View all activity

Organizations

mgoin's activity

New activity in neuralmagic/Llama-3.2-90B-Vision-Instruct-FP8-dynamic 5 days ago

Model does not run with VLLM

#3 opened 6 days ago by

New activity in nm-testing/Llama-3.3-70B-Instruct-FP8-dynamic 6 days ago

Nice model, any info on scripts used to quantize?

#1 opened 9 days ago by

New activity in neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic 6 days ago

Thanks!

#2 opened 14 days ago by

New activity in mistralai/Pixtral-Large-Instruct-2411 about 1 month ago

Add config_format and load_format to vLLM args

#5 opened about 1 month ago by

Update config.json to use null for sliding_window

#4 opened about 1 month ago by

New activity in mgoin/nemotron-3-8b-chat-4k-sft-hf about 1 month ago

Adding `safetensors` variant of this model

#1 opened about 1 month ago by

New activity in neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a16 about 2 months ago

Is this the standard GPTQ quantization?

#5 opened about 2 months ago by

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 about 2 months ago

Model weights are not loaded

#3 opened 4 months ago by

New activity in neuralmagic/pixtral-12b-FP8-dynamic about 2 months ago

Update model card

#1 opened about 2 months ago by

New activity in nm-testing/llava-1.5-7b-hf-FP8-dynamic about 2 months ago

Add chat_template to tokenizer_config.json

#1 opened about 2 months ago by

New activity in neuralmagic/Mistral-Nemo-Instruct-2407-FP8 about 2 months ago

7900xtx torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+

#3 opened about 2 months ago by

New activity in mistral-community/pixtral-12b 2 months ago

Why is the Pixtral activation function "gelu" when the reference code uses "silu"?

#10 opened 2 months ago by

Update tokenizer_config.json with chat_template

#11 opened 2 months ago by

New activity in neuralmagic/Llama-3.2-90B-Vision-Instruct-FP8-dynamic 2 months ago

Any chance your team is working on a 4-bit Llama-3.2-90B-Vision-Instruct-quantized.w4a16 version?

#1 opened 3 months ago by

New activity in neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic 3 months ago

Oom with 24g vram

#1 opened 3 months ago by

New activity in neuralmagic/Phi-3.5-mini-instruct-FP8-KV 3 months ago

latest vllm docker (v0.6.2) fail to load

#1 opened 3 months ago by

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 4 months ago

Issue with loading model

#1 opened 4 months ago by

New activity in neuralmagic/DeepSeek-Coder-V2-Instruct-FP8 4 months ago

Can it run on A100/A800 with VLLM?

#1 opened 5 months ago by

Parkerlambert123

New activity in neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16 4 months ago

weights does not exist when trying to deploy in sagemaker endpoint

#1 opened 4 months ago by

LorenzoCevolaniAXA

New activity in meta-llama/Llama-3.1-405B-Instruct 5 months ago

8-kv-heads

#17 opened 5 months ago by