tomg-group-umd/Gemstone-256x23

#685
by Austinkeith2010 - opened

A 50M model should only be at most 50 megabytes. Not 200 or 100 or whatever.

If this is a roundabout way of asking for quants, I am sorry - the Gemstone models all seem to lack the tokenizer.model model, which is required by llama.cpp to convert to gguf :(

If you can get the creators to add that file, I am willing to try to quantize all of them, of course.

Cheers!

mradermacher changed discussion status to closed

they're based off of the Gemma 2b architecture. try using a gemma 2b (not 2b-it) tokenizer

according to README.md: "Using Gemstone-256x23
The Gemstones are based on the gemma-2b architecture and use modeling_gemma.py to run using the transformers library."

There is a tokenizer.json though!

I just manually tried to convert the model to a source GGUF using latest llama.cpp using python convert_hf_to_gguf.py /root/Gemstone-256x23 --outfile /root/Gemstone-256x23.gguf and I'm getting the following error:

INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 5112, in <module>
    main()
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 5106, in main
    model_instance.write()
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 440, in write
    self.prepare_metadata(vocab_only=False)
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 433, in prepare_metadata
    self.set_vocab()
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 3227, in set_vocab
    self._set_vocab_sentencepiece()
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 792, in _set_vocab_sentencepiece
    tokens, scores, toktypes = self._create_vocab_sentencepiece()
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/llama.cpp/convert_hf_to_gguf.py", line 809, in _create_vocab_sentencepiece
    raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: /root/Gemstone-256x23/tokenizer.model

I haven't looked extremely deeply into this, but for many models, but not all, the tokenizer.json doesn't seem enough. There might actually be a way to convert one into another (I think when it is a old tokenizer vs. fasttokenizer issue), but right now, llama.cpp insists on the file for some model architectures.

they're also based off of the Gemma 2B architecture. try using a gemma 2b (not 2b-it) tokenizer!

If you (or somebody else) wants to clone it and provide the tokenizer.model, I'll be happy to quantize it.

Sign up or log in to comment