A dynamic parallel method for performance optimization on hybrid CPUs Paper • 2411.19542 • Published Nov 29 • 5
view article Article Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon May 9 • 12
view article Article Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding Jan 30 • 9
Intel Neural Chat Collection Fine-tuned 7B parameter LLM models, one of which made it to the top of the 7B HF LLM Leaderboard • 15 items • Updated Aug 23 • 2
TEQ: Trainable Equivalent Transformation for Quantization of LLMs Paper • 2310.10944 • Published Oct 17, 2023 • 9
Efficient Post-training Quantization with FP8 Formats Paper • 2309.14592 • Published Sep 26, 2023 • 10
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs Paper • 2309.05516 • Published Sep 11, 2023 • 9
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs Paper • 2306.16601 • Published Jun 28, 2023 • 4