Michael Goin's picture

Michael Goin PRO

mgoin

·

mgoin_
mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Recent Activity

updated a model 3 days ago

neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic

updated a model 3 days ago

neuralmagic/Sparse-Llama-3.1-8B-evolcodealpaca-2of4-FP8-dynamic

updated a model 3 days ago

neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4-FP8-dynamic

View all activity

Organizations

mgoin's activity

upvoted a paper about 1 month ago

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published Nov 7 • 70

upvoted a paper about 2 months ago

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4 • 46

upvoted a paper 4 months ago

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Paper • 2409.00492 • Published Aug 31 • 12

upvoted a collection 5 months ago

Llama-3.1 Quantization

Neural Magic quantized Llama-3.1 models • 22 items • Updated about 1 month ago • 40

upvoted a collection 6 months ago

FP8 LLMs for vLLM

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 44 items • Updated Oct 17 • 60

upvoted a paper 8 months ago

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Paper • 2405.03594 • Published May 6 • 7

upvoted a collection 9 months ago

Sparse Foundational Llama 2 Models

Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras • 27 items • Updated Sep 26 • 9

upvoted a collection 12 months ago

DeepSparse Sparse LLMs

Useful LLMs for DeepSparse where we've pruned at least 50% of the weights! • 10 items • Updated Sep 26 • 5

upvoted a collection about 1 year ago

Open LLM Leaderboard best models ❤️‍🔥

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 61 items • Updated about 2 hours ago • 482

upvoted 2 papers about 1 year ago

NEFTune: Noisy Embeddings Improve Instruction Finetuning

Paper • 2310.05914 • Published Oct 9, 2023 • 14

Sparse Finetuning for Inference Acceleration of Large Language Models

Paper • 2310.06927 • Published Oct 10, 2023 • 14

upvoted a collection about 1 year ago

Sparse Finetuning MPT

Explore our breakthrough in sparse fine-tuning LLMs! Our novel method maintains downstream accuracy even with >70% sparsity. • 13 items • Updated Sep 26 • 3

upvoted 2 papers over 1 year ago

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Paper • 2203.07259 • Published Mar 14, 2022 • 3

Platypus: Quick, Cheap, and Powerful Refinement of LLMs

Paper • 2308.07317 • Published Aug 14, 2023 • 23