qwen_l21_entropy / README.md
yakazimir's picture
End of training
7b7fa53 verified
|
raw
history blame
4.74 kB
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_l21_entropy
    results: []

qwen_l21_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6612
  • Rewards/chosen: -4.9613
  • Rewards/rejected: -8.3580
  • Rewards/accuracies: 0.6766
  • Rewards/margins: 3.3967
  • Logps/rejected: -8.3580
  • Logps/chosen: -4.9613
  • Logits/rejected: 1.3373
  • Logits/chosen: 0.9296

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6893 0.2141 400 0.6976 -5.6399 -5.6514 0.5134 0.0115 -5.6514 -5.6399 0.6073 0.4970
0.6905 0.4282 800 0.6888 -9.5942 -10.2217 0.5772 0.6275 -10.2217 -9.5942 0.9367 0.7851
0.6827 0.6422 1200 0.6809 -3.7037 -4.6831 0.6417 0.9794 -4.6831 -3.7037 0.4628 0.3100
0.665 0.8563 1600 0.6737 -4.1597 -6.3017 0.6588 2.1420 -6.3017 -4.1597 0.9087 0.6452
0.674 1.0704 2000 0.6702 -4.7093 -7.4594 0.6677 2.7501 -7.4594 -4.7093 1.0243 0.7072
0.6648 1.2845 2400 0.6651 -4.2327 -7.0267 0.6654 2.7940 -7.0267 -4.2327 0.9760 0.6519
0.6665 1.4986 2800 0.6654 -4.6367 -7.6607 0.6706 3.0240 -7.6607 -4.6367 1.0821 0.7239
0.6746 1.7127 3200 0.6641 -5.1015 -8.2207 0.6803 3.1192 -8.2207 -5.1015 1.0711 0.6993
0.6634 1.9267 3600 0.6629 -4.7411 -7.8576 0.6855 3.1165 -7.8576 -4.7411 1.0738 0.7086
0.6224 2.1408 4000 0.6607 -4.6523 -7.8867 0.6818 3.2344 -7.8867 -4.6523 1.1108 0.7335
0.6604 2.3549 4400 0.6618 -4.7746 -8.0447 0.6780 3.2700 -8.0447 -4.7746 1.2654 0.8695
0.6512 2.5690 4800 0.6615 -4.9147 -8.2777 0.6773 3.3630 -8.2777 -4.9147 1.2819 0.8805
0.6594 2.7831 5200 0.6611 -4.9802 -8.3859 0.6795 3.4057 -8.3859 -4.9802 1.2711 0.8676
0.6402 2.9972 5600 0.6612 -4.9613 -8.3580 0.6766 3.3967 -8.3580 -4.9613 1.3373 0.9296

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1