tomg-group-umd/Gemstone-256x23
A 50M model should only be at most 50 megabytes. Not 200 or 100 or whatever.
If this is a roundabout way of asking for quants, I am sorry - the Gemstone models all seem to lack the tokenizer.model model, which is required by llama.cpp to convert to gguf :(
If you can get the creators to add that file, I am willing to try to quantize all of them, of course.
Cheers!
they're based off of the Gemma 2b architecture. try using a gemma 2b (not 2b-it) tokenizer
according to README.md: "Using Gemstone-256x23
The Gemstones are based on the gemma-2b architecture and use modeling_gemma.py to run using the transformers library."
There is a tokenizer.json
though!
I just manually tried to convert the model to a source GGUF using latest llama.cpp using python convert_hf_to_gguf.py /root/Gemstone-256x23 --outfile /root/Gemstone-256x23.gguf
and I'm getting the following error:
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
File "/root/llama.cpp/convert_hf_to_gguf.py", line 5112, in <module>
main()
File "/root/llama.cpp/convert_hf_to_gguf.py", line 5106, in main
model_instance.write()
File "/root/llama.cpp/convert_hf_to_gguf.py", line 440, in write
self.prepare_metadata(vocab_only=False)
File "/root/llama.cpp/convert_hf_to_gguf.py", line 433, in prepare_metadata
self.set_vocab()
File "/root/llama.cpp/convert_hf_to_gguf.py", line 3227, in set_vocab
self._set_vocab_sentencepiece()
File "/root/llama.cpp/convert_hf_to_gguf.py", line 792, in _set_vocab_sentencepiece
tokens, scores, toktypes = self._create_vocab_sentencepiece()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/llama.cpp/convert_hf_to_gguf.py", line 809, in _create_vocab_sentencepiece
raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: /root/Gemstone-256x23/tokenizer.model
I haven't looked extremely deeply into this, but for many models, but not all, the tokenizer.json doesn't seem enough. There might actually be a way to convert one into another (I think when it is a old tokenizer vs. fasttokenizer issue), but right now, llama.cpp insists on the file for some model architectures.
they're also based off of the Gemma 2B architecture. try using a gemma 2b (not 2b-it) tokenizer!
If you (or somebody else) wants to clone it and provide the tokenizer.model, I'll be happy to quantize it.