meta-llama/Meta-Llama-3-70B-Instruct · Anyone else encountering bad quantized(?) performance with Llama3-70B?

I've been trying to use Llama3 70B using int8 and NF4 quantization on a single A100, but outputs seem to be quite broken.
Is anybody else encountering similar issues?

Example breakages include double comma, dates inserted in random places (even when e.g. asking for a Poem), or repeated words.

I've found a few other threads which seem to suggest the Llama3 models might be very susceptible to quantization.
Unfortunately I don't have a machine that can run the bfloat16 version.

https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct/discussions/11
https://github.com/ggerganov/llama.cpp/pull/6936
https://github.com/ggerganov/llama.cpp/discussions/6901
https://github.com/ml-explore/mlx-examples/issues/692