This is the INT4 Llama-3-8b model quantized by per-channel QQQ. QQQ is an innovative and hardware-optimized W4A8 quantization solution. For more details, please refer to our code repo and our paper.

Downloads last month
177
Safetensors
Model size
1.92B params
Tensor type
FP16
F32
I32
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.