Karzan
/

gpt2-walamakan-2

@@ -15,7 +15,12 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [Karzan/gpt2-walamakan-2](https://huggingface.co/Karzan/gpt2-walamakan-2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 7.0166
 ## Model description
@@ -45,42 +50,6 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_steps: 500
 - num_epochs: 30
-### Training results
-| Training Loss | Epoch | Step | Validation Loss |
-|:-------------:|:-----:|:----:|:---------------:|
-| No log        | 0.98  | 37   | 6.9009          |
-| 0.136         | 1.99  | 75   | 6.9015          |
-| 0.1351        | 2.99  | 113  | 6.9135          |
-| 0.1363        | 4.0   | 151  | 6.9216          |
-| 0.1363        | 4.98  | 188  | 6.9141          |
-| 0.1362        | 5.99  | 226  | 6.9270          |
-| 0.1386        | 6.99  | 264  | 6.9219          |
-| 0.1401        | 8.0   | 302  | 6.9344          |
-| 0.1401        | 8.98  | 339  | 6.9056          |
-| 0.1418        | 9.99  | 377  | 6.9461          |
-| 0.1433        | 10.99 | 415  | 6.9363          |
-| 0.1454        | 12.0  | 453  | 6.9393          |
-| 0.1454        | 12.98 | 490  | 6.9399          |
-| 0.1465        | 13.99 | 528  | 6.9567          |
-| 0.1462        | 14.99 | 566  | 6.9527          |
-| 0.1421        | 16.0  | 604  | 6.9574          |
-| 0.1421        | 16.98 | 641  | 6.9712          |
-| 0.136         | 17.99 | 679  | 6.9762          |
-| 0.1304        | 18.99 | 717  | 6.9776          |
-| 0.125         | 20.0  | 755  | 6.9827          |
-| 0.125         | 20.98 | 792  | 6.9812          |
-| 0.1211        | 21.99 | 830  | 6.9778          |
-| 0.1155        | 22.99 | 868  | 6.9991          |
-| 0.1116        | 24.0  | 906  | 7.0075          |
-| 0.1116        | 24.98 | 943  | 6.9988          |
-| 0.1077        | 25.99 | 981  | 7.0113          |
-| 0.1037        | 26.99 | 1019 | 7.0134          |
-| 0.1012        | 28.0  | 1057 | 7.0161          |
-| 0.1012        | 28.98 | 1094 | 7.0179          |
-| 0.0993        | 29.4  | 1110 | 7.0166          |
 ### Framework versions
 - Transformers 4.32.1

 This model is a fine-tuned version of [Karzan/gpt2-walamakan-2](https://huggingface.co/Karzan/gpt2-walamakan-2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- eval_loss: 7.2866
+- eval_runtime: 1.4868
+- eval_samples_per_second: 67.261
+- eval_steps_per_second: 3.363
+- epoch: 22.99
+- step: 868
 ## Model description
 - lr_scheduler_warmup_steps: 500
 - num_epochs: 30
 ### Framework versions
 - Transformers 4.32.1

config.json CHANGED Viewed

@@ -8,6 +8,7 @@
   "bos_token_id": 0,
   "embd_pdrop": 0.1,
   "eos_token_id": 0,
   "initializer_range": 0.02,
   "layer_norm_epsilon": 1e-06,
   "model_type": "gpt2",

   "bos_token_id": 0,
   "embd_pdrop": 0.1,
   "eos_token_id": 0,
+  "gradient_checkpointing": true,
   "initializer_range": 0.02,
   "layer_norm_epsilon": 1e-06,
   "model_type": "gpt2",

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:af9117fb21c0cfc3a00d3ceb1e6bb2bdaafa7347f9e563808efa6ced1630ad1b
 size 854378685

 version https://git-lfs.github.com/spec/v1
+oid sha256:36b13c98659a0a235eb2bad59060efe04e5d3930faae67cd6044aba951ff99c6
 size 854378685

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6e3bc4d32fa4b42b04f532331cb1de0a83b66cf98976c9b1105638b27151ae22
 size 4027

 version https://git-lfs.github.com/spec/v1
+oid sha256:c87c12da4937860650b9de0d3f85bbd416fc314affa57a6d939667d2999c1748
 size 4027