YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This model is trained with Iterative DPO in OpenRLHF

Datasets and Hyperparameters

Max Prompt Length: 2048
Max Response Length: 2048
best_of_n: 2 (2 samples for each prompt)
Learning Rate: 5e-7
Beta: 0.1
Scheduler: Cosine with Warmup (0.03) and MinLR (0.1 * init_lr)
Rollout Batch Size: 20000
Training Batch Size: 256
Number of Iterations: 9

Evaluation

########## First turn ##########
                      score
model           turn
Llama3-iter-dpo 1      8.55
########## Second turn ##########
                        score
model           turn
Llama3-iter-dpo 2     7.95625
########## Average ##########
                    score
model
Llama3-iter-dpo  8.253125
Llama3-sft-baseline 7.69
Downloads last month
4
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.