Any plans for gguf format?

by Xouthos - opened Jan 29, 2024

AI Sweden Model Hub org Jan 29, 2024

I see that the only quantized format available is gptq. Any chance we will get gguf format for us who are not using Nvidia hardware?

peter-sk

AI Sweden Model Hub org Jan 31, 2024

Assuming you have the weights for AI-Sweden-Models/gpt-sw3-20b-instruct in a folder with the name gpt-sw3-20b-instruct and you want a high-quality 5-bit model:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
python -m venv venv
. venv/bin/activate
python -m pip install -r requirements/requirements-convert-hf-to-gguf.txt
python convert-hf-to-gguf.py ../gpt-sw3-20b-instruct --outfile gpt-sw3-20b-instruct-f16.gguf
./quantize gpt-sw3-20b-instruct-f16.gguf gpt-sw3-20b-instruct-q5_k_m.gguf q5_k_m

There you go :-)
Peter

Xouthos

AI Sweden Model Hub org Jan 31, 2024

Thank you! Will try that out!

Xouthos

AI Sweden Model Hub org Jan 31, 2024

•

edited Jan 31, 2024

I tried it with gpt-sw3-6.7b-v2-instruct before I try it with the larger model, but I get this error:

python3 convert-hf-to-gguf.py models/gpt-sw3-6.7b-v2-instruct --outfile models/models--AI-Sweden-Models--gpt-sw3-6.7b-v2-instruct.gguf
Loading model: gpt-sw3-6.7b-v2-instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1246, in
main()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1233, in main
model_instance.set_vocab()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 52, in set_vocab
self._set_vocab_gpt2()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 247, in _set_vocab_gpt2
vocab_size = hparams.get("vocab_size", len(tokenizer.vocab))
^^^^^^^^^^^^^^^
AttributeError: 'GPTSw3Tokenizer' object has no attribute 'vocab'

timpal0l

AI Sweden Model Hub org Jan 31, 2024

@Xouthos What if you goto line 247 in /Users/vasilios.hatciliamis@schibsted.com/scripts/llama.cpp/convert-hf-to-gguf.py and hardcode vocab_size = 64000?

Xouthos

AI Sweden Model Hub org Jan 31, 2024

•

edited Jan 31, 2024

@timpal0l It did not help, still getting the error:

python3 convert-hf-to-gguf.py models/gpt-sw3-6.7b-v2-instruct --outfile models/models--AI-Sweden-Models--gpt-sw3-6.7b-v2-instruct.gguf
Loading model: gpt-sw3-6.7b-v2-instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1246, in
main()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1233, in main
model_instance.set_vocab()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 52, in set_vocab
self._set_vocab_gpt2()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 248, in _set_vocab_gpt2
assert max(tokenizer.vocab.values()) < vocab_size
^^^^^^^^^^^^^^^
AttributeError: 'GPTSw3Tokenizer' object has no attribute 'vocab'

timpal0l

AI Sweden Model Hub org Jan 31, 2024

@Xouthos
Could you replace:

vocab_size = hparams.get("vocab_size", len(tokenizer.vocab))

with

vocab_size = len(tokenizer.get_vocab())

and

assert max(tokenizer.vocab.values()) < vocab_size

with

assert max(tokenizer.get_vocab().values()) < vocab_size

peter-sk

AI Sweden Model Hub org Feb 1, 2024

I just tried it myself. The issues go further than vocab vs get_vocab(). Once all that is fixed, it does not do what to do with the self-attention bias.

It might be necessary for someone with intimate knowledge of the gpt-sw3 architecture to amend one of the llama.cpp convert scripts (or create a custom one).

Xouthos

AI Sweden Model Hub org Feb 2, 2024

Tried that as well @timpal0l now, getting:

Loading model: gpt-sw3-6.7b-v2-instruct
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
Traceback (most recent call last):
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 1246, in
main()
File "/Users/admin@schibsted.com/scripts/llama.cpp/convert-hf-to-gguf.py", line 1233, in main
model_instance.set_vocab()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 52, in set_vocab
self.set_vocab_gpt2()
File "/Users/admin/scripts/llama.cpp/convert-hf-to-gguf.py", line 250, in set_vocab_gpt2
reverse_vocab = {id: encoded_tok for encoded_tok, id in tokenizer.vocab.items()}
^^^^^^^^^^^^^^^
AttributeError: 'GPTSw3Tokenizer' object has no attribute 'vocab'

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment