gobean: quants at q4_0, q5_0, q8_0 since it's a Mixtral. Manually set EOS due to bug in llama.cpp.

- Original Model Card -

https://huggingface.co/abhishek/autotrain-mixtral-8x7b-orpo-v2

Model Trained Using AutoTrain

This model was trained using AutoTrain. For more information, please visit AutoTrain.

Usage


from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "PATH_TO_THIS_REPO"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype='auto'
).eval()

# Prompt content: "hi"
messages = [
    {"role": "user", "content": "hi"}
]

input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
output_ids = model.generate(input_ids.to('cuda'))
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)

# Model response: "Hello! How can I assist you today?"
print(response)
Downloads last month
16
GGUF
Model size
46.7B params
Architecture
llama
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.