Transformers
GGUF
Inference Endpoints

Wrong version of llamacpp used for quanting

#1
by gelukuMLG - opened

I just run the model using the latest version of koboldcpp and it says that the model needs requanting as it does not use the bpe tokenizer fix.

Backyard AI org

We are using the correct version of llama.cpp unless something went horribly wrong. I've used this model in Faraday and it appears to be working correctly, as in post-BPE fix. It might be something weird in how koboldcpp is reviewing that issue. It's also worth noting that command-r should not even be impacted by the BPE issue, as it's not a llama 3 model.

We are using the correct version of llama.cpp unless something went horribly wrong. I've used this model in Faraday and it appears to be working correctly, as in post-BPE fix. It might be something weird in how koboldcpp is reviewing that issue. It's also worth noting that command-r should not even be impacted by the BPE issue, as it's not a llama 3 model.

https://github.com/ggerganov/llama.cpp/pull/7063

Backyard AI org

We are using the correct version of llama.cpp unless something went horribly wrong. I've used this model in Faraday and it appears to be working correctly, as in post-BPE fix. It might be something weird in how koboldcpp is reviewing that issue. It's also worth noting that command-r should not even be impacted by the BPE issue, as it's not a llama 3 model.

https://github.com/ggerganov/llama.cpp/pull/7063

Unfortunately that PR did not exist at the time I did the quant. I will redo the quant.

Sign up or log in to comment