GTPQ, AWQ, GGUF version request

#1
by Komposter43 - opened

@TheBloke
Would you like to make the quantized version of this model for the community. Thank you.

@TheBloke please-please

@TheBloke
Hello, this is the futur goat !
please, please, change the priority order, we need a GGUF + AWQ & GPTQ :-)

I'm having a hard time finding the hw to do multiple quants for 70b models. I ll try to get it done later today

Shouldn't it have a model card first?

That doesn't mean it shouldn't have a model card

I'm having a hard time finding the hw to do multiple quants for 70b models. I ll try to get it done later today

So maybe start with some most popular quants? As for me I'm mostly need ONLY GGUF 4KM for 70B as it's the right size to fit into one 48Gb card or two 24Gb

That doesn't mean it shouldn't have a model card

How are quantization proccess related to the model card?

You can see card from version 1.1: https://huggingface.co/ICBU-NPU/FashionGPT-70B-V1.1

There are small difference.

Quants for this model are starting now

Hmm, looks like prompt format from v1.1 do not work with v1.2 properly :(

@TheBloke

Just fond another 70b model steals the #1 on the openllm leaderboard just now https://huggingface.co/ValiantLabs/ShiningValiant/tree/main. Shame it does not have a discussion section, so I have to place the GPTQ request here.

at least it has a model card

@TheBloke

Just fond another 70b model steals the #1 on the openllm leaderboard just now https://huggingface.co/ValiantLabs/ShiningValiant/tree/main. Shame it does not have a discussion section, so I have to place the GPTQ request here.

Komposter43 changed discussion status to closed

Sign up or log in to comment