AlekseyKorshuk
/

gpt2-jokes

@@ -3,7 +3,7 @@ license: mit
 tags:
 - generated_from_trainer
 datasets:
-- Fraser/short-jokes
 metrics:
 - accuracy
 model-index:
@@ -13,15 +13,15 @@ model-index:
       name: Causal Language Modeling
       type: text-generation
     dataset:
-      name: Fraser/short-jokes
-      type: Fraser/short-jokes
       config: default
       split: train[:5%]
       args: default
     metrics:
     - name: Accuracy
       type: accuracy
-      value: 0.8760281609284458
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -29,10 +29,10 @@ should probably proofread and complete it, then remove this comment. -->
 # gpt2-jokes
-This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the Fraser/short-jokes dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6851
-- Accuracy: 0.8760
 ## Model description
@@ -52,19 +52,38 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
-- train_batch_size: 8
-- eval_batch_size: 8
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
-- total_train_batch_size: 32
-- total_eval_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 1.0
 ### Training results
 ### Framework versions

 tags:
 - generated_from_trainer
 datasets:
+- short-jokes
 metrics:
 - accuracy
 model-index:
       name: Causal Language Modeling
       type: text-generation
     dataset:
+      name: short-jokes
+      type: short-jokes
       config: default
       split: train[:5%]
       args: default
     metrics:
     - name: Accuracy
       type: accuracy
+      value: 0.8795477617316698
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # gpt2-jokes
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the short-jokes dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6748
+- Accuracy: 0.8795
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 5e-05
+- train_batch_size: 32
+- eval_batch_size: 32
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
+- total_train_batch_size: 128
+- total_eval_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 1.0
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|
+| No log        | 0.06  | 100  | 0.7285          | 0.8732   |
+| No log        | 0.12  | 200  | 0.7141          | 0.8747   |
+| No log        | 0.17  | 300  | 0.7056          | 0.8757   |
+| No log        | 0.23  | 400  | 0.6992          | 0.8764   |
+| 0.7907        | 0.29  | 500  | 0.6942          | 0.8771   |
+| 0.7907        | 0.35  | 600  | 0.6906          | 0.8777   |
+| 0.7907        | 0.41  | 700  | 0.6873          | 0.8779   |
+| 0.7907        | 0.47  | 800  | 0.6848          | 0.8782   |
+| 0.7907        | 0.52  | 900  | 0.6830          | 0.8786   |
+| 0.7105        | 0.58  | 1000 | 0.6809          | 0.8788   |
+| 0.7105        | 0.64  | 1100 | 0.6794          | 0.8790   |
+| 0.7105        | 0.7   | 1200 | 0.6780          | 0.8792   |
+| 0.7105        | 0.76  | 1300 | 0.6770          | 0.8793   |
+| 0.7105        | 0.81  | 1400 | 0.6760          | 0.8794   |
+| 0.7034        | 0.87  | 1500 | 0.6755          | 0.8794   |
+| 0.7034        | 0.93  | 1600 | 0.6750          | 0.8795   |
+| 0.7034        | 0.99  | 1700 | 0.6748          | 0.8795   |
 ### Framework versions