End of training
Browse files- README.md +6 -37
- config.json +1 -0
- pytorch_model.bin +1 -1
- training_args.bin +1 -1
README.md
CHANGED
@@ -15,7 +15,12 @@ should probably proofread and complete it, then remove this comment. -->
|
|
15 |
|
16 |
This model is a fine-tuned version of [Karzan/gpt2-walamakan-2](https://huggingface.co/Karzan/gpt2-walamakan-2) on an unknown dataset.
|
17 |
It achieves the following results on the evaluation set:
|
18 |
-
-
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
## Model description
|
21 |
|
@@ -45,42 +50,6 @@ The following hyperparameters were used during training:
|
|
45 |
- lr_scheduler_warmup_steps: 500
|
46 |
- num_epochs: 30
|
47 |
|
48 |
-
### Training results
|
49 |
-
|
50 |
-
| Training Loss | Epoch | Step | Validation Loss |
|
51 |
-
|:-------------:|:-----:|:----:|:---------------:|
|
52 |
-
| No log | 0.98 | 37 | 6.9009 |
|
53 |
-
| 0.136 | 1.99 | 75 | 6.9015 |
|
54 |
-
| 0.1351 | 2.99 | 113 | 6.9135 |
|
55 |
-
| 0.1363 | 4.0 | 151 | 6.9216 |
|
56 |
-
| 0.1363 | 4.98 | 188 | 6.9141 |
|
57 |
-
| 0.1362 | 5.99 | 226 | 6.9270 |
|
58 |
-
| 0.1386 | 6.99 | 264 | 6.9219 |
|
59 |
-
| 0.1401 | 8.0 | 302 | 6.9344 |
|
60 |
-
| 0.1401 | 8.98 | 339 | 6.9056 |
|
61 |
-
| 0.1418 | 9.99 | 377 | 6.9461 |
|
62 |
-
| 0.1433 | 10.99 | 415 | 6.9363 |
|
63 |
-
| 0.1454 | 12.0 | 453 | 6.9393 |
|
64 |
-
| 0.1454 | 12.98 | 490 | 6.9399 |
|
65 |
-
| 0.1465 | 13.99 | 528 | 6.9567 |
|
66 |
-
| 0.1462 | 14.99 | 566 | 6.9527 |
|
67 |
-
| 0.1421 | 16.0 | 604 | 6.9574 |
|
68 |
-
| 0.1421 | 16.98 | 641 | 6.9712 |
|
69 |
-
| 0.136 | 17.99 | 679 | 6.9762 |
|
70 |
-
| 0.1304 | 18.99 | 717 | 6.9776 |
|
71 |
-
| 0.125 | 20.0 | 755 | 6.9827 |
|
72 |
-
| 0.125 | 20.98 | 792 | 6.9812 |
|
73 |
-
| 0.1211 | 21.99 | 830 | 6.9778 |
|
74 |
-
| 0.1155 | 22.99 | 868 | 6.9991 |
|
75 |
-
| 0.1116 | 24.0 | 906 | 7.0075 |
|
76 |
-
| 0.1116 | 24.98 | 943 | 6.9988 |
|
77 |
-
| 0.1077 | 25.99 | 981 | 7.0113 |
|
78 |
-
| 0.1037 | 26.99 | 1019 | 7.0134 |
|
79 |
-
| 0.1012 | 28.0 | 1057 | 7.0161 |
|
80 |
-
| 0.1012 | 28.98 | 1094 | 7.0179 |
|
81 |
-
| 0.0993 | 29.4 | 1110 | 7.0166 |
|
82 |
-
|
83 |
-
|
84 |
### Framework versions
|
85 |
|
86 |
- Transformers 4.32.1
|
|
|
15 |
|
16 |
This model is a fine-tuned version of [Karzan/gpt2-walamakan-2](https://huggingface.co/Karzan/gpt2-walamakan-2) on an unknown dataset.
|
17 |
It achieves the following results on the evaluation set:
|
18 |
+
- eval_loss: 7.2866
|
19 |
+
- eval_runtime: 1.4868
|
20 |
+
- eval_samples_per_second: 67.261
|
21 |
+
- eval_steps_per_second: 3.363
|
22 |
+
- epoch: 22.99
|
23 |
+
- step: 868
|
24 |
|
25 |
## Model description
|
26 |
|
|
|
50 |
- lr_scheduler_warmup_steps: 500
|
51 |
- num_epochs: 30
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
### Framework versions
|
54 |
|
55 |
- Transformers 4.32.1
|
config.json
CHANGED
@@ -8,6 +8,7 @@
|
|
8 |
"bos_token_id": 0,
|
9 |
"embd_pdrop": 0.1,
|
10 |
"eos_token_id": 0,
|
|
|
11 |
"initializer_range": 0.02,
|
12 |
"layer_norm_epsilon": 1e-06,
|
13 |
"model_type": "gpt2",
|
|
|
8 |
"bos_token_id": 0,
|
9 |
"embd_pdrop": 0.1,
|
10 |
"eos_token_id": 0,
|
11 |
+
"gradient_checkpointing": true,
|
12 |
"initializer_range": 0.02,
|
13 |
"layer_norm_epsilon": 1e-06,
|
14 |
"model_type": "gpt2",
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 854378685
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:36b13c98659a0a235eb2bad59060efe04e5d3930faae67cd6044aba951ff99c6
|
3 |
size 854378685
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4027
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c87c12da4937860650b9de0d3f85bbd416fc314affa57a6d939667d2999c1748
|
3 |
size 4027
|