GGUF's of ToxicHermes-2.5-Mistral-7B

This is a GGUF quantization of ToxicHermes-2.5-Mistral-7B.

Original Model Card:

ToxicHermes

OpenHermes-2.5 model + toxic-dpo Dataset = ToxicHermes

fine-tuned with Direct Preference Optimization (DPO)

Base Model: teknium/OpenHermes-2.5-Mistral-7B
Dataset: unalignment/toxic-dpo-v0.1

Usage

You can also run this model using the following code:

import transformers
from transformers import AutoTokenizer


model = "joey00072/ToxicHermes-2.5-Mistral-7B"
# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Training hyperparameters

LoRA:

r=16
lora_alpha=16
lora_dropout=0.05
bias="none"
task_type="CAUSAL_LM"
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']

Training arguments:

per_device_train_batch_size=4
gradient_accumulation_steps=4
gradient_checkpointing=True
learning_rate=5e-5
lr_scheduler_type="cosine"
max_steps=200
optim="paged_adamw_32bit"
warmup_steps=100

DPOTrainer:

beta=0.1
max_prompt_length=1024
max_length=1536