Model description
This is a GGUF version of the Meta-Llama-3-8B-OpenOrca model which itself is a fine-tuned version of the meta-llama/Meta-Llama-3-8B on 1.5k subsamples of the OpenOrca dataset.
This LLM follows the popular follows the ChatML template!
How to use
# Download the Q4_K_M.gguf or Q6_K.gguf version of the MuntasirHossain/Meta-Llama-3-8B-OpenOrca-GGUF model
!huggingface-cli download MuntasirHossain/Meta-Llama-3-8B-OpenOrca-GGUF Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
from llama_cpp import Llama
llm = Llama(
model_path="./content/Q4_K_M.gguf",
n_ctx=0, # input text context length, 0 = from model
verbose = False
)
# Define a function for inference
def llm_response(input_text = '', max_tokens=256):
system_prompt = "You are a helpful AI assistant."
prompt = f"<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{input_text}<|im_end|>\n<|im_start|>assistant"
output = llm(
prompt,
max_tokens=max_tokens,
stop=["<|im_end|>"],
)
return output
# generate model response
input_text = "Explain artificial general intelligence (AGI) in a few lines."
result = llm_response(input_text)
result['choices'][0]['text']