File size: 3,532 Bytes
6293a7a 83105f0 6293a7a 83105f0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
library_name: transformers
tags:
- experimental
base_model:
- nbeerbower/llama-3-bophades-v1-8B
datasets:
- jondurbin/gutenberg-dpo-v0.1
- ResplendentAI/NSFW_RP_Format_DPO
- flammenai/Date-DPO-v1
- jondurbin/truthy-dpo-v0.1
license: other
license_name: llama3
---
# llama-3-sauce-v1-8B
This model is based on Llama-3-8b, and is governed by [META LLAMA 3 COMMUNITY LICENSE AGREEMENT](LICENSE)
This is a bad finetune on llama-3-bophades-v1-8B using various DPO sets.
# Method
Finetuned using an A100 on Google Colab.
[Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)
### Configuration
Dataset preparation:
```python
def chatml_format(example):
# Initialize formatted system message
system = ""
# Check if 'system' field exists and is not None
if example.get('system'):
message = {"role": "system", "content": example['system']}
system = tokenizer.apply_chat_template([message], tokenize=False)
# Format instruction
message = {"role": "user", "content": example['prompt']}
prompt = tokenizer.apply_chat_template([message], tokenize=False, add_generation_prompt=True)
# Format chosen answer
chosen = example['chosen'] + "<|im_end|>\n"
# Format rejected answer
rejected = example['rejected'] + "<|im_end|>\n"
return {
"prompt": system + prompt,
"chosen": chosen,
"rejected": rejected,
}
# Array of datasets to concat
ds = [
"jondurbin/truthy-dpo-v0.1",
"ResplendentAI/NSFW_RP_Format_DPO",
"jondurbin/gutenberg-dpo-v0.1",
"flammenai/Date-DPO-v1"
]
# load_dataset and combine all
loaded_datasets = [load_dataset(dataset_name, split='train') for dataset_name in ds]
dataset = concatenate_datasets(loaded_datasets)
# Save columns
original_columns = dataset.column_names
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
# Format dataset
dataset = dataset.map(
chatml_format,
remove_columns=original_columns
)
```
LoRA, model, and training settings:
```python
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=420,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
max_prompt_length=2048,
max_length=4096,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
``` |