Anyone else encountering bad quantized(?) performance with Llama3-70B?
#37
by
philjd
- opened
I've been trying to use Llama3 70B using int8 and NF4 quantization on a single A100, but outputs seem to be quite broken.
Is anybody else encountering similar issues?
Example breakages include double comma, dates inserted in random places (even when e.g. asking for a Poem), or repeated words.
I've found a few other threads which seem to suggest the Llama3 models might be very susceptible to quantization.
Unfortunately I don't have a machine that can run the bfloat16 version.
Same issue encountered when doing int-8 quantization.
One solution is to enable group-wise quantization with group-size = 128/64.