davidberenstein1957's picture
Update README.md
95a69a1 verified
metadata
license: mit
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
  - distilabel
  - argilla
base_model: microsoft/phi-2
model-index:
  - name: phi2-lora-quantized-distilabel-intel-orca-dpo-pairs
    results: []
datasets:
  - argilla/distilabel-intel-orca-dpo-pairs
language:
  - en
pipeline_tag: text-generation

phi2-lora-quantized-distilabel-intel-orca-dpo-pairs

This model is a fine-tuned version of microsoft/phi-2 on distilabel-intel-orca-dpo-pairs. The full training notebook can be found here.

It achieves the following results on the evaluation set:

  • Loss: 0.4537
  • Rewards/chosen: -0.0837
  • Rewards/rejected: -1.2628
  • Rewards/accuracies: 0.8301
  • Rewards/margins: 1.1791
  • Logps/rejected: -224.8409
  • Logps/chosen: -203.2228
  • Logits/rejected: 0.4773
  • Logits/chosen: 0.3062

Model description

The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the distilabel-intel-orca-dpo-pairs. In order to scale LoRa approached for LLMs, I recommend looking at predibase/lorax.

You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).

import torch
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig
)
from peft import PeftModel

# template used for fine-tune
# template = """\
# Instruct: {instruction}\n
# Output: {response}"""

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using {torch.cuda.get_device_name(0)}")
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype='float16',
        bnb_4bit_use_double_quant=False,
    )
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    bnb_config = None
else:
    device = torch.device("cpu")
    bnb_config = None
    print("No GPU available, using CPU instead.")

config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)

prompt = "Instruct: What is the capital of France? \nOutput:""
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs)
text = tokenizer.batch_decode(outputs)[0]

Intended uses & limitations

This is a LoRa adapter fine-tine for phi-2 and not a full fine-tune of the model. Additionally, I did not spend time updating parameters.

Training and evaluation data

The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the distilabel-intel-orca-dpo-pairs. The full training notebook can be found here. Underneath, there are some configs for the adapter and the trainer.

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.5,
    r=32,
    target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'],
    bias="none",
    task_type="CAUSAL_LM",
)
training_arguments = TrainingArguments(
    output_dir=f"./{model_name}",
    evaluation_strategy="steps",
    do_eval=True,
    optim="paged_adamw_8bit",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=16,
    per_device_eval_batch_size=2,
    log_level="debug",
    save_steps=20,
    logging_steps=20,
    learning_rate=1e-5,
    eval_steps=20,
    num_train_epochs=1, # Modified for tutorial purposes
    max_steps=100,
    warmup_steps=20,
    lr_scheduler_type="linear",
)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6853 0.06 20 0.6701 0.0133 -0.0368 0.6905 0.0501 -212.5803 -202.2522 0.3853 0.2532
0.6312 0.12 40 0.5884 0.0422 -0.2208 0.8138 0.2630 -214.4207 -201.9638 0.4254 0.2816
0.547 0.19 60 0.5146 0.0172 -0.5786 0.8278 0.5958 -217.9983 -202.2132 0.4699 0.3110
0.4388 0.25 80 0.4893 -0.0808 -1.0789 0.8293 0.9981 -223.0014 -203.1934 0.5158 0.3396
0.4871 0.31 100 0.4818 -0.1298 -1.2346 0.8297 1.1048 -224.5586 -203.6837 0.5133 0.3340
0.4863 0.37 120 0.4723 -0.1230 -1.1718 0.8301 1.0488 -223.9305 -203.6159 0.4910 0.3167
0.4578 0.44 140 0.4666 -0.1257 -1.1772 0.8301 1.0515 -223.9844 -203.6428 0.4795 0.3078
0.4587 0.5 160 0.4625 -0.0746 -1.1272 0.8301 1.0526 -223.4841 -203.1310 0.4857 0.3139
0.4688 0.56 180 0.4595 -0.0584 -1.1194 0.8297 1.0610 -223.4062 -202.9692 0.4890 0.3171
0.4189 0.62 200 0.4579 -0.0666 -1.1647 0.8297 1.0982 -223.8598 -203.0511 0.4858 0.3138
0.4392 0.68 220 0.4564 -0.0697 -1.1915 0.8301 1.1219 -224.1278 -203.0823 0.4824 0.3110
0.4659 0.75 240 0.4554 -0.0826 -1.2245 0.8301 1.1419 -224.4574 -203.2112 0.4761 0.3052
0.4075 0.81 260 0.4544 -0.0823 -1.2328 0.8301 1.1504 -224.5403 -203.2089 0.4749 0.3044
0.4015 0.87 280 0.4543 -0.0833 -1.2590 0.8301 1.1757 -224.8026 -203.2188 0.4779 0.3067
0.4365 0.93 300 0.4539 -0.0846 -1.2658 0.8301 1.1812 -224.8702 -203.2313 0.4780 0.3067
0.4589 1.0 320 0.4537 -0.0837 -1.2628 0.8301 1.1791 -224.8409 -203.2228 0.4773 0.3062

Framework versions

  • PEFT 0.7.1
  • Transformers 4.37.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.1