dranger003
commited on
Commit
•
396f0bf
1
Parent(s):
a6b8c22
Update README.md
Browse files
README.md
CHANGED
@@ -11,4 +11,47 @@ GGUF version of [c4ai-command-r7b-12-2024](https://huggingface.co/CohereForAI/c4
|
|
11 |
./build/bin/llama-cli -fa --no-display-prompt -c 0 -m ggml-c4ai-command-r-7b-12-2024-q4_k.gguf -p "<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>You are a helpful assistant.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Tell me all about yourself.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>"
|
12 |
```
|
13 |
|
14 |
-
https://github.com/ggerganov/llama.cpp/issues/10816#issuecomment-2548574766
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
./build/bin/llama-cli -fa --no-display-prompt -c 0 -m ggml-c4ai-command-r-7b-12-2024-q4_k.gguf -p "<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>You are a helpful assistant.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Tell me all about yourself.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>"
|
12 |
```
|
13 |
|
14 |
+
https://github.com/ggerganov/llama.cpp/issues/10816#issuecomment-2548574766
|
15 |
+
|
16 |
+
```
|
17 |
+
llama_new_context_with_model: n_seq_max = 1
|
18 |
+
llama_new_context_with_model: n_ctx = 8192
|
19 |
+
llama_new_context_with_model: n_ctx_per_seq = 8192
|
20 |
+
llama_new_context_with_model: n_batch = 2048
|
21 |
+
llama_new_context_with_model: n_ubatch = 512
|
22 |
+
llama_new_context_with_model: flash_attn = 1
|
23 |
+
llama_new_context_with_model: freq_base = 50000.0
|
24 |
+
llama_new_context_with_model: freq_scale = 1
|
25 |
+
llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB
|
26 |
+
llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
|
27 |
+
llama_new_context_with_model: CPU output buffer size = 0.98 MiB
|
28 |
+
llama_new_context_with_model: CUDA0 compute buffer size = 1328.31 MiB
|
29 |
+
llama_new_context_with_model: CUDA_Host compute buffer size = 24.01 MiB
|
30 |
+
llama_new_context_with_model: graph nodes = 841
|
31 |
+
llama_new_context_with_model: graph splits = 324 (with bs=512), 1 (with bs=1)
|
32 |
+
common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
|
33 |
+
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
|
34 |
+
main: llama threadpool init, n_threads = 16
|
35 |
+
|
36 |
+
system_info: n_threads = 16 (n_threads_batch = 16) / 32 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
|
37 |
+
|
38 |
+
sampler seed: 2760461191
|
39 |
+
sampler params:
|
40 |
+
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
|
41 |
+
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 8192
|
42 |
+
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
|
43 |
+
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
|
44 |
+
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
|
45 |
+
generate: n_ctx = 8192, n_batch = 2048, n_predict = -1, n_keep = 1
|
46 |
+
|
47 |
+
I am Command, a sophisticated large language model built by the company Cohere. I assist users by providing thorough responses to a wide range of queries, offering information, and performing various tasks. My capabilities include answering questions, generating text, summarizing content, extracting data, and performing various other tasks based on the user's requirements.
|
48 |
+
|
49 |
+
I strive to provide accurate and helpful information while ensuring a positive and informative user experience. Feel free to ask me about any topic, and I'll do my best to assist you! [end of text]
|
50 |
+
|
51 |
+
|
52 |
+
llama_perf_sampler_print: sampling time = 15.07 ms / 128 runs ( 0.12 ms per token, 8491.44 tokens per second)
|
53 |
+
llama_perf_context_print: load time = 1076.84 ms
|
54 |
+
llama_perf_context_print: prompt eval time = 181.62 ms / 22 tokens ( 8.26 ms per token, 121.13 tokens per second)
|
55 |
+
llama_perf_context_print: eval time = 4938.01 ms / 105 runs ( 47.03 ms per token, 21.26 tokens per second)
|
56 |
+
llama_perf_context_print: total time = 5163.42 ms / 127 tokens
|
57 |
+
```
|