Model description

This is a GGUF version of the Meta-Llama-3-8B-OpenOrca model which itself is a fine-tuned version of the meta-llama/Meta-Llama-3-8B on 1.5k subsamples of the OpenOrca dataset.

This LLM follows the popular follows the ChatML template!

How to use

# Download the Q4_K_M.gguf or Q6_K.gguf version of the MuntasirHossain/Meta-Llama-3-8B-OpenOrca-GGUF model 
!huggingface-cli download MuntasirHossain/Meta-Llama-3-8B-OpenOrca-GGUF Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

from llama_cpp import Llama

llm = Llama(
  model_path="./content/Q4_K_M.gguf",
  n_ctx=0,  # input text context length, 0 = from model
  verbose = False
)

# Define a function for inference
def llm_response(input_text = '', max_tokens=256):
  system_prompt = "You are a helpful AI assistant."
  prompt = f"<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{input_text}<|im_end|>\n<|im_start|>assistant"
  output = llm(
      prompt,
      max_tokens=max_tokens,
      stop=["<|im_end|>"],
      )
  return output

# generate model response
input_text = "Explain artificial general intelligence (AGI) in a few lines."
result = llm_response(input_text)
result['choices'][0]['text']