Haihao Shen's picture

Haihao Shen

Haihao

·

https://github.com/intel/auto-round

AI & ML interests

LLM quantization, sparsity, and acceleration

Recent Activity

reacted to wenhuach's post with 🚀 12 days ago

This week, OPEA Space released several new INT4 models, including: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF allenai/OLMo-2-1124-13B-Instruct THUDM/glm-4v-9b AIDC-AI/Marco-o1 and several others. Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen! https://huggingface.co/OPEA

authored a paper 26 days ago

A dynamic parallel method for performance optimization on hybrid CPUs

upvoted a paper 26 days ago

A dynamic parallel method for performance optimization on hybrid CPUs

View all activity

Articles

Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Organizations

Haihao's activity

upvoted a paper 26 days ago

A dynamic parallel method for performance optimization on hybrid CPUs

Paper • 2411.19542 • Published Nov 29 • 5

upvoted a paper 3 months ago

Effective Quantization for Diffusion Models on CPUs

Paper • 2311.16133 • Published Nov 2, 2023 • 4

upvoted an article 4 months ago

Article

Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon

May 9

• 12

upvoted an article 7 months ago

Article

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Jan 30

• 9

upvoted a collection 11 months ago

Intel Neural Chat

Fine-tuned 7B parameter LLM models, one of which made it to the top of the 7B HF LLM Leaderboard • 15 items • Updated Aug 23 • 2

upvoted 3 papers about 1 year ago

Efficient LLM Inference on CPUs

Paper • 2311.00502 • Published Nov 1, 2023 • 7

TEQ: Trainable Equivalent Transformation for Quantization of LLMs

Paper • 2310.10944 • Published Oct 17, 2023 • 9

Efficient Post-training Quantization with FP8 Formats

Paper • 2309.14592 • Published Sep 26, 2023 • 10

upvoted 2 papers over 1 year ago

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 9

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

Paper • 2306.16601 • Published Jun 28, 2023 • 4