Llama0-3-8b-ultra-p-0.05-lr1e-6-e3

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5678
  • Rewards/chosen: -3.1024
  • Rewards/rejected: -5.3878
  • Rewards/accuracies: 0.7656
  • Rewards/margins: 2.2855
  • Logps/rejected: -803.3770
  • Logps/chosen: -566.8709
  • Logits/rejected: -0.7286
  • Logits/chosen: -0.3055

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5867 0.2060 100 0.5790 -0.4168 -0.7897 0.6797 0.3729 -343.5607 -298.3142 0.0683 0.0050
0.5459 0.4119 200 0.5337 -0.8336 -1.5839 0.7422 0.7504 -422.9892 -339.9911 0.4007 0.2879
0.5201 0.6179 300 0.5116 -0.7067 -1.5136 0.7344 0.8069 -415.9542 -327.3016 0.3623 0.2661
0.5068 0.8239 400 0.5037 -0.7404 -1.6591 0.7891 0.9187 -430.5064 -330.6776 0.2848 0.2141
0.427 1.0299 500 0.5057 -1.4842 -2.9740 0.75 1.4898 -561.9933 -405.0575 -0.1430 -0.0848
0.3367 1.2358 600 0.5150 -1.9307 -3.6670 0.75 1.7363 -631.2911 -449.7062 -0.1170 -0.0016
0.336 1.4418 700 0.5013 -1.6315 -3.1525 0.7656 1.5211 -579.8499 -419.7817 -0.0661 0.0619
0.3443 1.6478 800 0.4919 -1.5274 -2.9336 0.7656 1.4062 -557.9580 -409.3778 -0.0808 0.0430
0.3387 1.8538 900 0.5136 -1.8875 -3.4761 0.7578 1.5886 -612.2042 -445.3885 -0.0675 0.0881
0.2045 2.0597 1000 0.5396 -2.6871 -4.6850 0.7656 1.9979 -733.0979 -525.3492 -0.3513 -0.1306
0.1911 2.2657 1100 0.5562 -3.0265 -5.1837 0.7422 2.1572 -782.9683 -559.2891 -0.6321 -0.2757
0.1935 2.4717 1200 0.5518 -2.8870 -5.0043 0.75 2.1173 -765.0246 -545.3388 -0.6105 -0.2462
0.1909 2.6777 1300 0.5623 -3.0447 -5.2451 0.75 2.2004 -789.1040 -561.1038 -0.6371 -0.2728
0.1805 2.8836 1400 0.5746 -3.2314 -5.5860 0.75 2.3546 -823.1945 -579.7725 -0.7721 -0.3436

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tongliuphysics/Llama0-3-8b-ultra-p-0.05-lr1e-6-e3

Finetuned
(534)
this model
Quantizations
1 model