Is this model based on `chat` or `chat-hf` model of llama2?

#6
by pootow - opened

Llama2 has 4 kinds of models: Llama2 Llama2-hf Llama2-chat Llama2-chat-hf
Which one is this model based on?

From the first line in the Model card: "These files are GPTQ model files for Meta's Llama 2 13B-chat"

Which links to:
https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

Oh, the information is hidden in the link!

This is 13B Chat, but actually my link is a little wrong. I based this on 13B-Chat not 13B-Chat-HF. I intended to base it on 13B-Chat-HF, because that's in the right format for me to quantise. But when I tried, it failed with a weird quantisation problem.

Ultimately 13B-Chat and 13B-Chat-HF should be identical, besides being in different formats (PTH vs pytorch_model.bin / model.safetensors). But I have found problems using the Meta HF format repos.

So in the end, my quants were made like this:

  1. Download 13B Chat PTH files direct from Meta via their download.sh
  2. Convert to HF myself, using Transformers convert_llama_weights_to_hf.py
  3. Then quantise as usual
  4. I also then uploaded the HF files I converted myself, to my -fp16 repos.

I don't know why their HF files are causing problems, I've yet to investigate that.

Sign up or log in to comment