Quantizing the model - on our own

#3
by christianweyer - opened

Thanks for this great model!

As llama.cpp does not support the CogVLMForCausalLM architecture - how could we quantize the model on our own?

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

It is not fit for llama.cpp(gguf) formate, we will provide int4 hf model

Great!
How can we execute/run the model e.g. on a M3 Mac then?

6bit or 8bit? as 4bit quite dum dum

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

4bit will provide. not test in mac because using trition

Sign up or log in to comment