---
base_model: lvwerra/gpt2-imdb
tags:
- generated_from_trainer
model-index:
- name: gpt-imdb-ipo-beta_0.1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gpt-imdb-ipo-beta_0.1

This model is a fine-tuned version of [lvwerra/gpt2-imdb](https://huggingface.co/lvwerra/gpt2-imdb) on an unknown dataset.
It achieves the following results on the evaluation set:
- Step: 6500
- Loss: 11.7007
- Rewards/chosen: -0.0805
- Rewards/rejected: -0.4417
- Rewards/accuracies: 0.9000
- Rewards/margins: 0.3612
- Logps/rejected: -268.1027
- Logps/chosen: -236.0704
- Logits/rejected: -31.0790
- Logits/chosen: -31.2840

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 150
- training_steps: 7197

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 18.812        | 0.21  | 500  | 29.2155         | 0.0458         | -0.2317          | 0.7875             | 0.2775          | -266.0027      | -234.8074    | -33.9160        | -34.3504      |
| 13.7881       | 0.42  | 1000 | 24.1460         | -0.0697        | -0.3582          | 0.7625             | 0.2885          | -267.2670      | -235.9622    | -35.0526        | -35.3757      |
| 27.0047       | 0.63  | 1500 | 39.7182         | -0.1370        | -0.4692          | 0.7875             | 0.3322          | -268.3775      | -236.6354    | -32.1933        | -32.4137      |
| 19.7751       | 0.83  | 2000 | 40.6223         | -0.0674        | -0.4210          | 0.7729             | 0.3536          | -267.8954      | -235.9392    | -31.7349        | -31.9095      |
| 9.5381        | 1.04  | 2500 | 20.9269         | -0.1155        | -0.4866          | 0.8146             | 0.3712          | -268.5513      | -236.4198    | -32.1382        | -32.3448      |
| 20.3498       | 1.25  | 3000 | 29.2158         | -0.0629        | -0.4040          | 0.8208             | 0.3410          | -267.7249      | -235.8945    | -31.7900        | -32.1080      |
| 20.4018       | 1.46  | 3500 | 20.8452         | -0.0350        | -0.3582          | 0.8271             | 0.3232          | -267.2670      | -235.6155    | -31.3911        | -31.6578      |
| 17.4506       | 1.67  | 4000 | 16.4207         | -0.1258        | -0.4841          | 0.8438             | 0.3583          | -268.5259      | -236.5234    | -31.5718        | -31.7727      |
| 7.7045        | 1.88  | 4500 | 14.3286         | -0.0659        | -0.4275          | 0.875              | 0.3616          | -267.9600      | -235.9239    | -31.3055        | -31.4702      |
| 9.4274        | 2.08  | 5000 | 12.6249         | -0.1037        | -0.4565          | 0.8687             | 0.3528          | -268.2499      | -236.3019    | -31.4025        | -31.6122      |
| 7.7699        | 2.29  | 5500 | 12.3366         | -0.0787        | -0.4337          | 0.8708             | 0.3550          | -268.0224      | -236.0526    | -30.8436        | -31.0563      |
| 9.2038        | 2.5   | 6000 | 12.2158         | -0.0882        | -0.4430          | 0.8937             | 0.3548          | -268.1148      | -236.1471    | -30.7819        | -30.9884      |
| 11.4596       | 2.71  | 6500 | 11.7007         | -0.0852        | -0.4480          | 0.9000             | 0.3628          | -268.1655      | -236.1172    | -31.0236        | -31.2283      |
| 9.6351        | 2.92  | 7000 | 12.0082         | -0.0805        | -0.4417          | 0.8958             | 0.3612          | -268.1027      | -236.0704    | -31.0790        | -31.2840      |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.1
- Datasets 2.15.0
- Tokenizers 0.15.0