OpenVINO IR model with int8 quantization

Model definition for LocalAI:

name: localai-llama3
backend: transformers
parameters:
  model: fakezeta/LocalAI-Llama3-8b-Function-Call-v0.2-ov-int8
context_size: 8192
type: OVModelForCausalLM
template:
  use_tokenizer_template: true

To run the model directly with LocalAI:

local-ai run huggingface://fakezeta/LocalAI-Llama3-8b-Function-Call-v0.2-ov-int8/model.yaml

LocalAI-Llama3-8b-Function-Call-v0.2

This model is a fine-tune on a custom dataset + glaive to work specifically and leverage all the LocalAI features of constrained grammar.

Specifically, the model once enters in tools mode will always reply with JSON.

To run on LocalAI:

local-ai run huggingface://mudler/LocalAI-Llama3-8b-Function-Call-v0.2-GGUF/localai.yaml

If you like my work, consider up donating so can get resources for my fine-tunes!

fakezeta
/

LocalAI-Llama3-8b-Function-Call-v0.2-ov-int8

OpenVINO IR model with int8 quantization

LocalAI-Llama3-8b-Function-Call-v0.2

Collection including fakezeta/LocalAI-Llama3-8b-Function-Call-v0.2-ov-int8

Llama-3