Trying to quantize the model using llama.cpp

by Fabian96 - opened Dec 12, 2023

Dec 12, 2023

Hi,

i get the error: (model has 32128, but models/LionM-70B/tokenizer.model has 32000) when trying to convert the model to FP16 format before quantization. This seems to be a mismatch between the used tokenizer and the config.json.

An easy fix is to just set the "vocab_size" parameter to 32000, however, this results inn problems further down the line, when quantizing.

Any suggestions?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment