---
base_model: lvwerra/gpt2-imdb
tags:
- generated_from_trainer
model-index:
- name: gpt-imdb-cdpo_0.15-beta_0.1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gpt-imdb-cdpo_0.15-beta_0.1

This model is a fine-tuned version of [lvwerra/gpt2-imdb](https://huggingface.co/lvwerra/gpt2-imdb) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5181
- Rewards/chosen: -0.6104
- Rewards/rejected: -1.9969
- Rewards/accuracies: 0.9271
- Rewards/margins: 1.3866
- Logps/rejected: -283.6544
- Logps/chosen: -241.3688
- Logits/rejected: -36.1797
- Logits/chosen: -37.0193

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 150
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5541        | 0.21  | 500  | 0.5598          | -0.1801        | -1.1214          | 0.8417             | 0.9413          | -274.8995      | -237.0667    | -33.1267        | -34.0864      |
| 0.5399        | 0.42  | 1000 | 0.5555          | -0.4075        | -1.5309          | 0.8604             | 1.1234          | -278.9942      | -239.3399    | -36.6366        | -37.5032      |
| 0.5379        | 0.63  | 1500 | 0.5445          | -0.5885        | -1.8167          | 0.875              | 1.2282          | -281.8521      | -241.1506    | -34.0236        | -34.9075      |
| 0.5224        | 0.83  | 2000 | 0.5347          | -0.4581        | -1.7693          | 0.8917             | 1.3112          | -281.3783      | -239.8462    | -34.9412        | -35.8186      |
| 0.4992        | 1.04  | 2500 | 0.5318          | -0.5998        | -1.9222          | 0.9000             | 1.3224          | -282.9072      | -241.2631    | -34.8041        | -35.6967      |
| 0.5654        | 1.25  | 3000 | 0.5308          | -0.5502        | -1.9299          | 0.9021             | 1.3797          | -282.9844      | -240.7672    | -35.6718        | -36.5937      |
| 0.5382        | 1.46  | 3500 | 0.5247          | -0.4952        | -1.8522          | 0.9125             | 1.3570          | -282.2072      | -240.2172    | -35.7229        | -36.6547      |
| 0.5409        | 1.67  | 4000 | 0.5220          | -0.5742        | -1.9755          | 0.9292             | 1.4013          | -283.4403      | -241.0072    | -36.4780        | -37.3339      |
| 0.4911        | 1.88  | 4500 | 0.5186          | -0.6281        | -2.0249          | 0.9271             | 1.3967          | -283.9341      | -241.5466    | -36.1014        | -36.8989      |
| 0.5007        | 2.08  | 5000 | 0.5170          | -0.6115        | -2.0085          | 0.9312             | 1.3969          | -283.7699      | -241.3805    | -36.7092        | -37.5360      |
| 0.4714        | 2.29  | 5500 | 0.5166          | -0.5400        | -1.9265          | 0.9229             | 1.3865          | -282.9501      | -240.6650    | -36.1382        | -36.9914      |
| 0.5159        | 2.5   | 6000 | 0.5168          | -0.5925        | -1.9754          | 0.9271             | 1.3829          | -283.4395      | -241.1906    | -35.9587        | -36.8156      |
| 0.5103        | 2.71  | 6500 | 0.5171          | -0.6197        | -2.0190          | 0.9333             | 1.3993          | -283.8753      | -241.4619    | -36.0316        | -36.8825      |
| 0.5049        | 2.92  | 7000 | 0.5181          | -0.6104        | -1.9969          | 0.9271             | 1.3866          | -283.6544      | -241.3688    | -36.1797        | -37.0193      |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.1
- Datasets 2.15.0
- Tokenizers 0.15.0