add prompt instruction (#9)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -36,12 +36,14 @@ You can download only the quants you need instead of cloning the entire reposito
 huggingface-cli download MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF --local-dir . --include '*Q2_K*gguf'
 ```
-## Load sharded model
-`llama_load_model_from_file` will detect the number of files and will load additional tensors from the rest of files.
 ```sh
-llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q2_K-00001-of-00005.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 1024 -e
 ```

 huggingface-cli download MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF --local-dir . --include '*Q2_K*gguf'
 ```
+## Load GGUF models
+You `MUST` follow the prompt template provided by Llama-3:
 ```sh
+./llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q2_K.gguf -r '<|eot_id|>' --in-prefix "\n<|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>
+\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi!<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n" -n 1024
 ```