ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k - a G-reen Collection

G-reen 's Collections

ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k

ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k

updated Jul 2, 2024

Several trained models to compare the differences between each method. Each model has a complete description of hyperparams with wandb reports.

unsloth/mistral-7b-v0.2-bnb-4bit

Text Generation • Updated Sep 11, 2024 • 2.34k • 14

Note All training runs were done on this model (4 bit qlora). Go unsloth!
argilla/dpo-mix-7k

Viewer • Updated Jul 16, 2024 • 7.5k • 450 • 159

Note Used this entire dataset for training. For SFT, the rejected part of the dataset was ignored.
G-reen/EXPERIMENT-DPO-m7b2-1-merged

Text Generation • Updated Apr 15, 2024 • 161
Note The image shows a comparison between all the completed DPO runs.
G-reen/EXPERIMENT-DPO-m7b2-2-merged

Text Generation • Updated Apr 15, 2024 • 163

Note Probably the best loss curve at lr=5e-5.
G-reen/EXPERIMENT-DPO-m7b2-3-merged

Text Generation • Updated Apr 15, 2024 • 164

Note Failed to train, definitely do not use
G-reen/EXPERIMENT-DPO-m7b2-4-merged

Text Generation • Updated Apr 5, 2024 • 163
G-reen/EXPERIMENT-SFT-m7b2-1-merged

Text Generation • Updated Apr 15, 2024 • 83
Note The image shows a comparison between all the completed SFT runs.
G-reen/EXPERIMENT-SFT-m7b2-2-merged

Text Generation • Updated Apr 15, 2024 • 160

Note Probably the best loss curve at lr=5e-5.
G-reen/EXPERIMENT-SFT-m7b2-3-merged

Text Generation • Updated Apr 15, 2024 • 82
G-reen/EXPERIMENT-ORPO-m7b2-1-merged

Text Generation • Updated Apr 16, 2024 • 89
G-reen/EXPERIMENT-ORPO-m7b2-2-merged

Text Generation • Updated Apr 19, 2024 • 84