Llama.cpp command-r pre-tokenizer gguf fixed

main: build = 2789 (84250014)
main: built with gcc (Ubuntu 13.2.0-4ubuntu3) 13.2.0 for x86_64-linux-gnu
main: quantizing '/gguf/c4ai-commandr-v01_a.gguf' to '/gguf/c4ai-command-r-v01-Q5_K_M.gguf' as Q5_K_M
llama_model_loader: loaded meta data with 26 key-value pairs and 322 tensors from c4ai-commandr-v01_a.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = command-r
llama_model_loader: - kv   1:                      command-r.block_count u32              = 40
llama_model_loader: - kv   2:                   command-r.context_length u32              = 131072
llama_model_loader: - kv   3:                 command-r.embedding_length u32              = 8192
llama_model_loader: - kv   4:              command-r.feed_forward_length u32              = 22528
llama_model_loader: - kv   5:             command-r.attention.head_count u32              = 64
llama_model_loader: - kv   6:          command-r.attention.head_count_kv u32              = 64
llama_model_loader: - kv   7:                   command-r.rope.freq_base f32              = 8000000.000000
llama_model_loader: - kv   8:     command-r.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv   9:                          general.file_type u32              = 1
llama_model_loader: - kv  10:                      command-r.logit_scale f32              = 0.062500
llama_model_loader: - kv  11:                command-r.rope.scaling.type str              = none
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  13:                         tokenizer.ggml.pre str              = command-r
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,256000]  = ["<PAD>", "<UNK>", "<CLS>", "<SEP>", ...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,253333]  = ["Ġ Ġ", "Ġ t", "e r", "i n", "Ġ a...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 5
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 255001
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  22:           tokenizer.chat_template.tool_use str              = {{ bos_token }}{% if messages[0]['rol...
llama_model_loader: - kv  23:                tokenizer.chat_template.rag str              = {{ bos_token }}{% if messages[0]['rol...
llama_model_loader: - kv  24:                   tokenizer.chat_templates arr[str,2]       = ["rag", "tool_use"]
llama_model_loader: - kv  25:                    tokenizer.chat_template str              = {{ bos_token }}{% if messages[0]['rol...
Downloads last month
71
GGUF
Model size
35B params
Architecture
command-r

4-bit

5-bit

6-bit

Inference API
Unable to determine this model's library. Check the docs .