Karzan commited on
Commit
258271e
1 Parent(s): aaf78df

End of training

Browse files
Files changed (4) hide show
  1. README.md +6 -37
  2. config.json +1 -0
  3. pytorch_model.bin +1 -1
  4. training_args.bin +1 -1
README.md CHANGED
@@ -15,7 +15,12 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  This model is a fine-tuned version of [Karzan/gpt2-walamakan-2](https://huggingface.co/Karzan/gpt2-walamakan-2) on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 7.0166
 
 
 
 
 
19
 
20
  ## Model description
21
 
@@ -45,42 +50,6 @@ The following hyperparameters were used during training:
45
  - lr_scheduler_warmup_steps: 500
46
  - num_epochs: 30
47
 
48
- ### Training results
49
-
50
- | Training Loss | Epoch | Step | Validation Loss |
51
- |:-------------:|:-----:|:----:|:---------------:|
52
- | No log | 0.98 | 37 | 6.9009 |
53
- | 0.136 | 1.99 | 75 | 6.9015 |
54
- | 0.1351 | 2.99 | 113 | 6.9135 |
55
- | 0.1363 | 4.0 | 151 | 6.9216 |
56
- | 0.1363 | 4.98 | 188 | 6.9141 |
57
- | 0.1362 | 5.99 | 226 | 6.9270 |
58
- | 0.1386 | 6.99 | 264 | 6.9219 |
59
- | 0.1401 | 8.0 | 302 | 6.9344 |
60
- | 0.1401 | 8.98 | 339 | 6.9056 |
61
- | 0.1418 | 9.99 | 377 | 6.9461 |
62
- | 0.1433 | 10.99 | 415 | 6.9363 |
63
- | 0.1454 | 12.0 | 453 | 6.9393 |
64
- | 0.1454 | 12.98 | 490 | 6.9399 |
65
- | 0.1465 | 13.99 | 528 | 6.9567 |
66
- | 0.1462 | 14.99 | 566 | 6.9527 |
67
- | 0.1421 | 16.0 | 604 | 6.9574 |
68
- | 0.1421 | 16.98 | 641 | 6.9712 |
69
- | 0.136 | 17.99 | 679 | 6.9762 |
70
- | 0.1304 | 18.99 | 717 | 6.9776 |
71
- | 0.125 | 20.0 | 755 | 6.9827 |
72
- | 0.125 | 20.98 | 792 | 6.9812 |
73
- | 0.1211 | 21.99 | 830 | 6.9778 |
74
- | 0.1155 | 22.99 | 868 | 6.9991 |
75
- | 0.1116 | 24.0 | 906 | 7.0075 |
76
- | 0.1116 | 24.98 | 943 | 6.9988 |
77
- | 0.1077 | 25.99 | 981 | 7.0113 |
78
- | 0.1037 | 26.99 | 1019 | 7.0134 |
79
- | 0.1012 | 28.0 | 1057 | 7.0161 |
80
- | 0.1012 | 28.98 | 1094 | 7.0179 |
81
- | 0.0993 | 29.4 | 1110 | 7.0166 |
82
-
83
-
84
  ### Framework versions
85
 
86
  - Transformers 4.32.1
 
15
 
16
  This model is a fine-tuned version of [Karzan/gpt2-walamakan-2](https://huggingface.co/Karzan/gpt2-walamakan-2) on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
+ - eval_loss: 7.2866
19
+ - eval_runtime: 1.4868
20
+ - eval_samples_per_second: 67.261
21
+ - eval_steps_per_second: 3.363
22
+ - epoch: 22.99
23
+ - step: 868
24
 
25
  ## Model description
26
 
 
50
  - lr_scheduler_warmup_steps: 500
51
  - num_epochs: 30
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ### Framework versions
54
 
55
  - Transformers 4.32.1
config.json CHANGED
@@ -8,6 +8,7 @@
8
  "bos_token_id": 0,
9
  "embd_pdrop": 0.1,
10
  "eos_token_id": 0,
 
11
  "initializer_range": 0.02,
12
  "layer_norm_epsilon": 1e-06,
13
  "model_type": "gpt2",
 
8
  "bos_token_id": 0,
9
  "embd_pdrop": 0.1,
10
  "eos_token_id": 0,
11
+ "gradient_checkpointing": true,
12
  "initializer_range": 0.02,
13
  "layer_norm_epsilon": 1e-06,
14
  "model_type": "gpt2",
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:af9117fb21c0cfc3a00d3ceb1e6bb2bdaafa7347f9e563808efa6ced1630ad1b
3
  size 854378685
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36b13c98659a0a235eb2bad59060efe04e5d3930faae67cd6044aba951ff99c6
3
  size 854378685
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6e3bc4d32fa4b42b04f532331cb1de0a83b66cf98976c9b1105638b27151ae22
3
  size 4027
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c87c12da4937860650b9de0d3f85bbd416fc314affa57a6d939667d2999c1748
3
  size 4027