Edit model card

Model Card for Llama-3-8B-Instruct-Iterative-SamPO

This repository provides a fine-tuned version of Llama-3-8B-Instruct, using our proposed SamPO algorithm: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence. We obey all licenses mentioned in llama3's work.

Performance

Model GSM8K IFEval PiQA MMLU TruthfulQA AlpacaEval2 LC AlpacaEval2 Length in Tokens
Llama3-8B-Instruct 75.06 49.40 80.69 63.85 36.47 22.57 22.92 421
Llama3-8B-Instruct-DPO 75.59 51.80 81.94 64.06 40.39 23.34 23.20 422
Llama3-8B-Instruct-Iterative-DPO 74.91 52.52 81.66 64.02 39.90 23.92 25.50 403
Llama3-8B-Instruct-Iterative-SamPO 77.81 60.55 81.18 64.12 44.07 30.68 35.14 377

Evaluation Details

Five conditional benchmarks, using lm-evaluation-harness:

  • GSM8K: 8-shot, report strict match
  • IFEval: 3-shot, report instruction-level strict accuracy
  • PiQA: 3-shot, report accuracy
  • MMLU: 0-shot, report normalized accuracy
  • TruthfulQA: 3-shot, report accuracy of single-true mc1 setting

One open-ended benchmark, using official alpaca_eval:

  • AlpacaEval2: win rate (%) judged by GPT-4-turbo between the model's outputs vs. the GPT-4-turbo's response
  • LC AlpacaEval2: length-debiased win rate (%) of AlpacaEval2
  • Length in Tokens: the average output length of AlpacaEval2, calculated in tokens with Llama3's tokenizer

Input Format

The model is trained to use the following format:

<|start_header_id|>user<|end_header_id|>

{PROMPT}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

{Response}

Training hyperparameters

The following hyperparameters were used during DPO/SamPO training:

  • DPO beta: 0.1
  • learning_rate: 4e-7
  • total_train_batch_size: 128
  • optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • Weight Decay: 0.0
  • num_epochs: 3.0
  • Specifically add above input format over training samples
Downloads last month
1
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference API
Input a message to start chatting with Junrulu/Llama-3-8B-Instruct-Iterative-SamPO.
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train Junrulu/Llama-3-8B-Instruct-Iterative-SamPO