Edit model card

Llama-3-pineapple-2x8B

Llama-3-pineapple-2x8B is a Mixture of Experts (MoE) made with the following models:

Configuration

base_model: fhnw/Llama-3-8B-pineapple-pizza-orpo
experts:
- source_model: fhnw/Llama-3-8B-pineapple-pizza-orpo
  positive_prompts: ["assistant", "chat"]
- source_model: fhnw/Llama-3-8B-pineapple-recipe-sft
  positive_prompts: ["recipe"]
gate_mode: hidden
dtype: float16

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "fhnw/Llama-3-pineapple-2x8B"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to(device)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Is pineapple on a pizza a crime?"}
]

input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
Downloads last month
2
Safetensors
Model size
13.7B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Merge of

Collection including fhnw/Llama-3-pineapple-2x8B