phi2-lora-quantized-distilabel-intel-orca-dpo-pairs

This model is a fine-tuned version of microsoft/phi-2 on distilabel-intel-orca-dpo-pairs. The full training notebook can be found here.

It achieves the following results on the evaluation set:

  • Loss: 0.4537
  • Rewards/chosen: -0.0837
  • Rewards/rejected: -1.2628
  • Rewards/accuracies: 0.8301
  • Rewards/margins: 1.1791
  • Logps/rejected: -224.8409
  • Logps/chosen: -203.2228
  • Logits/rejected: 0.4773
  • Logits/chosen: 0.3062

Model description

The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the distilabel-intel-orca-dpo-pairs. In order to scale LoRa approached for LLMs, I recommend looking at predibase/lorax.

You can play around with the model shown below. We load the LoRa adapter and bits_n_bytes config (only when CUDA is available).

import torch
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig
)
from peft import PeftModel

# template used for fine-tune
# template = """\
# Instruct: {instruction}\n
# Output: {response}"""

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using {torch.cuda.get_device_name(0)}")
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype='float16',
        bnb_4bit_use_double_quant=False,
    )
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    bnb_config = None
else:
    device = torch.device("cpu")
    bnb_config = None
    print("No GPU available, using CPU instead.")

config = PeftConfig.from_pretrained("davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float16, quantization_config=bnb_config)
model = PeftModel.from_pretrained(model, "davidberenstein1957/phi2-lora-quantized-distilabel-intel-orca-dpo-pairs").to(device)

prompt = "Instruct: What is the capital of France? \nOutput:""
inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs)
text = tokenizer.batch_decode(outputs)[0]

Intended uses & limitations

This is a LoRa adapter fine-tine for phi-2 and not a full fine-tune of the model. Additionally, I did not spend time updating parameters.

Training and evaluation data

The adapter was fine-tuned on a Google Colab A100 GPU using DPO and the distilabel-intel-orca-dpo-pairs. The full training notebook can be found here. Underneath, there are some configs for the adapter and the trainer.

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.5,
    r=32,
    target_modules=['k_proj', 'q_proj', 'v_proj', 'fc1', 'fc2'],
    bias="none",
    task_type="CAUSAL_LM",
)
training_arguments = TrainingArguments(
    output_dir=f"./{model_name}",
    evaluation_strategy="steps",
    do_eval=True,
    optim="paged_adamw_8bit",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=16,
    per_device_eval_batch_size=2,
    log_level="debug",
    save_steps=20,
    logging_steps=20,
    learning_rate=1e-5,
    eval_steps=20,
    num_train_epochs=1, # Modified for tutorial purposes
    max_steps=100,
    warmup_steps=20,
    lr_scheduler_type="linear",
)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6853 0.06 20 0.6701 0.0133 -0.0368 0.6905 0.0501 -212.5803 -202.2522 0.3853 0.2532
0.6312 0.12 40 0.5884 0.0422 -0.2208 0.8138 0.2630 -214.4207 -201.9638 0.4254 0.2816
0.547 0.19 60 0.5146 0.0172 -0.5786 0.8278 0.5958 -217.9983 -202.2132 0.4699 0.3110
0.4388 0.25 80 0.4893 -0.0808 -1.0789 0.8293 0.9981 -223.0014 -203.1934 0.5158 0.3396
0.4871 0.31 100 0.4818 -0.1298 -1.2346 0.8297 1.1048 -224.5586 -203.6837 0.5133 0.3340
0.4863 0.37 120 0.4723 -0.1230 -1.1718 0.8301 1.0488 -223.9305 -203.6159 0.4910 0.3167
0.4578 0.44 140 0.4666 -0.1257 -1.1772 0.8301 1.0515 -223.9844 -203.6428 0.4795 0.3078
0.4587 0.5 160 0.4625 -0.0746 -1.1272 0.8301 1.0526 -223.4841 -203.1310 0.4857 0.3139
0.4688 0.56 180 0.4595 -0.0584 -1.1194 0.8297 1.0610 -223.4062 -202.9692 0.4890 0.3171
0.4189 0.62 200 0.4579 -0.0666 -1.1647 0.8297 1.0982 -223.8598 -203.0511 0.4858 0.3138
0.4392 0.68 220 0.4564 -0.0697 -1.1915 0.8301 1.1219 -224.1278 -203.0823 0.4824 0.3110
0.4659 0.75 240 0.4554 -0.0826 -1.2245 0.8301 1.1419 -224.4574 -203.2112 0.4761 0.3052
0.4075 0.81 260 0.4544 -0.0823 -1.2328 0.8301 1.1504 -224.5403 -203.2089 0.4749 0.3044
0.4015 0.87 280 0.4543 -0.0833 -1.2590 0.8301 1.1757 -224.8026 -203.2188 0.4779 0.3067
0.4365 0.93 300 0.4539 -0.0846 -1.2658 0.8301 1.1812 -224.8702 -203.2313 0.4780 0.3067
0.4589 1.0 320 0.4537 -0.0837 -1.2628 0.8301 1.1791 -224.8409 -203.2228 0.4773 0.3062

Framework versions

  • PEFT 0.7.1
  • Transformers 4.37.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.1
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for argilla/phi2-lora-distilabel-intel-orca-dpo-pairs

Base model

microsoft/phi-2
Adapter
(814)
this model

Dataset used to train argilla/phi2-lora-distilabel-intel-orca-dpo-pairs