LeoLM/leo-hessianai-13b-chat · tokenizer.model missing?

Oct 1, 2023

•

edited Oct 1, 2023

I'm sorry if this is a newbie question, but i cannot convert this model because the tokenzier.model is missing here? Is that correc tor a mistake? (Or can i generate it by myself?)

fuchsst

Oct 15, 2023

same problem, the file is missing in this repo. wanted to convert it to GGUF to use with llama.cpp

tried https://huggingface.co/LeoLM/leo-hessianai-13b/blob/main/tokenizer.model but then it complains

Writing models/LeoML-13B/ggml-model-f16.gguf, format 1
Traceback (most recent call last):
  File "/home/whstfuch/appl/llama.cpp/convert.py", line 1193, in <module>
    main()
  File "/home/whstfuch/appl/llama.cpp/convert.py", line 1188, in main
    OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab, concurrency = args.concurrency)
  File "/home/whstfuch/appl/llama.cpp/convert.py", line 907, in write_all
    check_vocab_size(params, vocab)
  File "/home/whstfuch/appl/llama.cpp/convert.py", line 802, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has 32128, but models/LeoML-13B/tokenizer.model has 32000).

bjoernp

LAION LeoLM org Oct 18, 2023

Llama.cpp has recently implemented support for added tokens in #3475 ( see this tweet). This should solve the issues you've been having.

bjoernp changed discussion status to closed Oct 18, 2023