gobean: quants to q4_0, q5_0, q8_0 - the favorites for Mixtrals. Manually set EOS token to 2, llama.cpp has a bug.

- Original Model Card -

https://huggingface.co/abhishek/autotrain-mixtral-8x7b-orpo-v1

Model Trained Using AutoTrain

This model was trained using AutoTrain. For more information, please visit AutoTrain.

Usage


from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "PATH_TO_THIS_REPO"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype='auto'
).eval()

# Prompt content: "hi"
messages = [
    {"role": "user", "content": "hi"}
]

input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
output_ids = model.generate(input_ids.to('cuda'))
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)

# Model response: "Hello! How can I assist you today?"
print(response)
Downloads last month
34
GGUF
Model size
46.7B params
Architecture
llama
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.