Is this compatible with the KV_Cache_dtype being FP8?
#1
by
nickandbro
- opened
Can you please let me know if I can set the kv_cache_dtype as fp8 in vllm using this model?
Also, thank you for doing this for the community to use!
@nickandbro thanks for reporting this! I started looking into it and found a bug. This should work after this lands in vLLM https://github.com/vllm-project/vllm/pull/6761
@mgoin Thanks!