There are some stray llama2 </s> tokens for some reason, but tested to work correctly with multiturn w/ the llama3 chat_template:

./server -ngl 99 -m shisa-v1-llama3-8b.Q5_K_M.gguf --chat-template llama3 -fa -v

Note: BF16 GGUFs have no CUDA implementation atm: https://github.com/ggerganov/llama.cpp/issues/7211

Conversion was done from HEAD on 2024-03-27 version: 3005 (d6ef0e77) (closest release is 3006):

#!/bin/bash

cd llama.cpp

echo 'Converting HF to GGUF'
python convert-hf-to-gguf.py --outtype bf16 --outfile /models/gguf/shisa-v1-llama3-8b.bf16.gguf /models/hf/shisa-v1-llama3-8b

echo 'Quanting...'
time ./quantize /models/gguf/shisa-v1-llama3-8b.bf16.gguf /models/gguf/shisa-v1-llama3-8b.Q8_0.gguf Q8_0
time ./quantize /models/gguf/shisa-v1-llama3-8b.bf16.gguf /models/gguf/shisa-v1-llama3-8b.Q6_K.gguf Q6_K
time ./quantize /models/gguf/shisa-v1-llama3-8b.bf16.gguf /models/gguf/shisa-v1-llama3-8b.Q5_K_M.gguf Q5_K_M
time ./quantize /models/gguf/shisa-v1-llama3-8b.bf16.gguf /models/gguf/shisa-v1-llama3-8b.Q4_K_M.gguf Q4_K_M
time ./quantize /models/gguf/shisa-v1-llama3-8b.bf16.gguf /models/gguf/shisa-v1-llama3-8b.Q4_0.gguf Q4_0
Downloads last month
60
GGUF
Model size
8.03B params
Architecture
llama

4-bit

5-bit

6-bit

8-bit

16-bit

Inference API
Unable to determine this model's library. Check the docs .

Model tree for shisa-ai/shisa-v1-llama3-8b-gguf

Quantized
(1)
this model

Dataset used to train shisa-ai/shisa-v1-llama3-8b-gguf