alternative serving framework
#1
by
erichartford
- opened
does it work with tgi? vllm? sglang?
with vllm I get:
ValueError: Unknown quantization method: intel/auto-round. Must be one of ['aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'fbgemm_fp8', 'modelopt', 'marlin', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex'].
We have also uploaded the GPTQ format, which is compatible with other frameworks. Please checkout to the revision="6d3d2cf" as detailed in the README.