dranger003 commited on
Commit
396f0bf
1 Parent(s): a6b8c22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -1
README.md CHANGED
@@ -11,4 +11,47 @@ GGUF version of [c4ai-command-r7b-12-2024](https://huggingface.co/CohereForAI/c4
11
  ./build/bin/llama-cli -fa --no-display-prompt -c 0 -m ggml-c4ai-command-r-7b-12-2024-q4_k.gguf -p "<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>You are a helpful assistant.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Tell me all about yourself.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>"
12
  ```
13
 
14
- https://github.com/ggerganov/llama.cpp/issues/10816#issuecomment-2548574766
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ./build/bin/llama-cli -fa --no-display-prompt -c 0 -m ggml-c4ai-command-r-7b-12-2024-q4_k.gguf -p "<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>You are a helpful assistant.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Tell me all about yourself.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>"
12
  ```
13
 
14
+ https://github.com/ggerganov/llama.cpp/issues/10816#issuecomment-2548574766
15
+
16
+ ```
17
+ llama_new_context_with_model: n_seq_max = 1
18
+ llama_new_context_with_model: n_ctx = 8192
19
+ llama_new_context_with_model: n_ctx_per_seq = 8192
20
+ llama_new_context_with_model: n_batch = 2048
21
+ llama_new_context_with_model: n_ubatch = 512
22
+ llama_new_context_with_model: flash_attn = 1
23
+ llama_new_context_with_model: freq_base = 50000.0
24
+ llama_new_context_with_model: freq_scale = 1
25
+ llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB
26
+ llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
27
+ llama_new_context_with_model: CPU output buffer size = 0.98 MiB
28
+ llama_new_context_with_model: CUDA0 compute buffer size = 1328.31 MiB
29
+ llama_new_context_with_model: CUDA_Host compute buffer size = 24.01 MiB
30
+ llama_new_context_with_model: graph nodes = 841
31
+ llama_new_context_with_model: graph splits = 324 (with bs=512), 1 (with bs=1)
32
+ common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
33
+ common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
34
+ main: llama threadpool init, n_threads = 16
35
+
36
+ system_info: n_threads = 16 (n_threads_batch = 16) / 32 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
37
+
38
+ sampler seed: 2760461191
39
+ sampler params:
40
+ repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
41
+ dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 8192
42
+ top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
43
+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
44
+ sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
45
+ generate: n_ctx = 8192, n_batch = 2048, n_predict = -1, n_keep = 1
46
+
47
+ I am Command, a sophisticated large language model built by the company Cohere. I assist users by providing thorough responses to a wide range of queries, offering information, and performing various tasks. My capabilities include answering questions, generating text, summarizing content, extracting data, and performing various other tasks based on the user's requirements.
48
+
49
+ I strive to provide accurate and helpful information while ensuring a positive and informative user experience. Feel free to ask me about any topic, and I'll do my best to assist you! [end of text]
50
+
51
+
52
+ llama_perf_sampler_print: sampling time = 15.07 ms / 128 runs ( 0.12 ms per token, 8491.44 tokens per second)
53
+ llama_perf_context_print: load time = 1076.84 ms
54
+ llama_perf_context_print: prompt eval time = 181.62 ms / 22 tokens ( 8.26 ms per token, 121.13 tokens per second)
55
+ llama_perf_context_print: eval time = 4938.01 ms / 105 runs ( 47.03 ms per token, 21.26 tokens per second)
56
+ llama_perf_context_print: total time = 5163.42 ms / 127 tokens
57
+ ```