OpenVINO IR model with int8 quantization

Model definition for LocalAI:

name: localai-llama3
backend: transformers
parameters:
  model: fakezeta/LocalAI-Llama3-8b-Function-Call-v0.2-ov-int8
context_size: 8192
type: OVModelForCausalLM
template:
  use_tokenizer_template: true

To run the model directly with LocalAI:

local-ai run huggingface://fakezeta/LocalAI-Llama3-8b-Function-Call-v0.2-ov-int8/model.yaml

LocalAI-Llama3-8b-Function-Call-v0.2

local-ai-banner.png

LocalAIFCALL

This model is a fine-tune on a custom dataset + glaive to work specifically and leverage all the LocalAI features of constrained grammar.

Specifically, the model once enters in tools mode will always reply with JSON.

To run on LocalAI:

local-ai run huggingface://mudler/LocalAI-Llama3-8b-Function-Call-v0.2-GGUF/localai.yaml

If you like my work, consider up donating so can get resources for my fine-tunes!

Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including fakezeta/LocalAI-Llama3-8b-Function-Call-v0.2-ov-int8