大模型的量化目前主流有3种,
- GGUF,主要是给CPU使用。
- GPTQ, GPU使用。
- AWQ,可以理解是一种更新的量化工具,比GPTQ量化带来更好的性能。
notebook
还有些地方需要理解。
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.