Is this compatible with the KV_Cache_dtype being FP8?

#1
by nickandbro - opened

Can you please let me know if I can set the kv_cache_dtype as fp8 in vllm using this model?

Also, thank you for doing this for the community to use!

Neural Magic org

@nickandbro thanks for reporting this! I started looking into it and found a bug. This should work after this lands in vLLM https://github.com/vllm-project/vllm/pull/6761

Sign up or log in to comment