Edit model card
Configuration Parsing Warning: In config.json: "quantization_config.bits" must be an integer

Exllamav2 quant (exl2 / 2.2 bpw) made with ExLlamaV2 v0.0.21

Other EXL2 quants:

Quant Model Size lm_head
2.2
4176 MB
6
2.5
4519 MB
6
3.0
5143 MB
6
3.5
5766 MB
6
3.75
6077 MB
6
4.0
6391 MB
6
4.25
6703 MB
6
5.0
7637 MB
6
6.0
8992 MB
8
6.5
9616 MB
8
8.0
11473 MB
8

Meta-Llama-3-12B-Instruct

Meta-Llama-3-12B-Instruct is a merge of the following models using LazyMergekit:

πŸ† Evaluation

Model AGIEval GPT4All TruthfulQA Bigbench Average
Meta-Llama-3-12B-Instruct 41.7 67.71 52.75 40.58 50.69
Meta-Llama-3-12B 29.46 68.01 41.02 35.57 43.52

🧩 Configuration

slices:
  - sources:
    - model: NousResearch/Meta-Llama-3-8B-Instruct
      layer_range: [0,9]
  - sources:
    - model: NousResearch/Meta-Llama-3-8B-Instruct
      layer_range: [5,14]
  - sources:
    - model: NousResearch/Meta-Llama-3-8B-Instruct
      layer_range: [10,19]
  - sources:
    - model: NousResearch/Meta-Llama-3-8B-Instruct
      layer_range: [15,24]
  - sources:
    - model: NousResearch/Meta-Llama-3-8B-Instruct
      layer_range: [20,32]
merge_method: passthrough
dtype: bfloat16

πŸ’» Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/Meta-Llama-3-12B-Instruct"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Zoyd/mlabonne_Meta-Llama-3-12B-Instruct-2_2bpw_exl2

Quantized
(22)
this model