lilyray commited on
Commit
1ece1ee
·
verified ·
1 Parent(s): 90cfe51

distilbert-emotion-hyper

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +10 -12
  2. config.json +1 -1
  3. model.safetensors +1 -1
  4. run-0/checkpoint-4000/config.json +50 -0
  5. run-0/checkpoint-4000/model.safetensors +3 -0
  6. run-0/checkpoint-4000/optimizer.pt +3 -0
  7. run-0/checkpoint-4000/rng_state.pth +3 -0
  8. run-0/checkpoint-4000/scheduler.pt +3 -0
  9. run-0/checkpoint-4000/special_tokens_map.json +15 -0
  10. run-0/checkpoint-4000/spiece.model +3 -0
  11. run-0/checkpoint-4000/tokenizer_config.json +58 -0
  12. run-0/checkpoint-4000/trainer_state.json +91 -0
  13. run-0/checkpoint-4000/training_args.bin +3 -0
  14. run-1/checkpoint-12000/config.json +50 -0
  15. run-1/checkpoint-12000/model.safetensors +3 -0
  16. run-1/checkpoint-12000/optimizer.pt +3 -0
  17. run-1/checkpoint-12000/rng_state.pth +3 -0
  18. run-1/checkpoint-12000/scheduler.pt +3 -0
  19. run-1/checkpoint-12000/special_tokens_map.json +15 -0
  20. run-1/checkpoint-12000/spiece.model +3 -0
  21. run-1/checkpoint-12000/tokenizer_config.json +58 -0
  22. run-1/checkpoint-12000/trainer_state.json +221 -0
  23. run-1/checkpoint-12000/training_args.bin +3 -0
  24. run-1/checkpoint-16000/config.json +50 -0
  25. run-1/checkpoint-16000/model.safetensors +3 -0
  26. run-1/checkpoint-16000/optimizer.pt +3 -0
  27. run-1/checkpoint-16000/rng_state.pth +3 -0
  28. run-1/checkpoint-16000/scheduler.pt +3 -0
  29. run-1/checkpoint-16000/special_tokens_map.json +15 -0
  30. run-1/checkpoint-16000/spiece.model +3 -0
  31. run-1/checkpoint-16000/tokenizer_config.json +58 -0
  32. run-1/checkpoint-16000/trainer_state.json +286 -0
  33. run-1/checkpoint-16000/training_args.bin +3 -0
  34. run-1/checkpoint-4000/config.json +50 -0
  35. run-1/checkpoint-4000/model.safetensors +3 -0
  36. run-1/checkpoint-4000/optimizer.pt +3 -0
  37. run-1/checkpoint-4000/rng_state.pth +3 -0
  38. run-1/checkpoint-4000/scheduler.pt +3 -0
  39. run-1/checkpoint-4000/special_tokens_map.json +15 -0
  40. run-1/checkpoint-4000/spiece.model +3 -0
  41. run-1/checkpoint-4000/tokenizer_config.json +58 -0
  42. run-1/checkpoint-4000/trainer_state.json +91 -0
  43. run-1/checkpoint-4000/training_args.bin +3 -0
  44. run-1/checkpoint-8000/config.json +50 -0
  45. run-1/checkpoint-8000/model.safetensors +3 -0
  46. run-1/checkpoint-8000/optimizer.pt +3 -0
  47. run-1/checkpoint-8000/rng_state.pth +3 -0
  48. run-1/checkpoint-8000/scheduler.pt +3 -0
  49. run-1/checkpoint-8000/special_tokens_map.json +15 -0
  50. run-1/checkpoint-8000/spiece.model +3 -0
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  license: apache-2.0
3
- base_model: albert-base-v2
4
  tags:
5
  - generated_from_trainer
6
  datasets:
@@ -22,7 +22,7 @@ model-index:
22
  metrics:
23
  - name: Accuracy
24
  type: accuracy
25
- value: 0.9325
26
  ---
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -30,10 +30,10 @@ should probably proofread and complete it, then remove this comment. -->
30
 
31
  # albert_emotion
32
 
33
- This model is a fine-tuned version of [albert-base-v2](https://huggingface.co/albert-base-v2) on the emotion dataset.
34
  It achieves the following results on the evaluation set:
35
- - Loss: 0.1943
36
- - Accuracy: 0.9325
37
 
38
  ## Model description
39
 
@@ -52,21 +52,19 @@ More information needed
52
  ### Training hyperparameters
53
 
54
  The following hyperparameters were used during training:
55
- - learning_rate: 2e-05
56
- - train_batch_size: 8
57
  - eval_batch_size: 8
58
- - seed: 42
59
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
60
  - lr_scheduler_type: linear
61
- - num_epochs: 3
62
 
63
  ### Training results
64
 
65
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
66
  |:-------------:|:-----:|:----:|:---------------:|:--------:|
67
- | 0.3186 | 1.0 | 2000 | 0.3021 | 0.916 |
68
- | 0.2018 | 2.0 | 4000 | 0.2196 | 0.934 |
69
- | 0.1207 | 3.0 | 6000 | 0.1971 | 0.936 |
70
 
71
 
72
  ### Framework versions
 
1
  ---
2
  license: apache-2.0
3
+ base_model: lilyray/albert_emotion
4
  tags:
5
  - generated_from_trainer
6
  datasets:
 
22
  metrics:
23
  - name: Accuracy
24
  type: accuracy
25
+ value: 0.9295
26
  ---
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
30
 
31
  # albert_emotion
32
 
33
+ This model is a fine-tuned version of [lilyray/albert_emotion](https://huggingface.co/lilyray/albert_emotion) on the emotion dataset.
34
  It achieves the following results on the evaluation set:
35
+ - Loss: 0.2391
36
+ - Accuracy: 0.9295
37
 
38
  ## Model description
39
 
 
52
  ### Training hyperparameters
53
 
54
  The following hyperparameters were used during training:
55
+ - learning_rate: 9.363600088100325e-06
56
+ - train_batch_size: 4
57
  - eval_batch_size: 8
58
+ - seed: 19
59
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
60
  - lr_scheduler_type: linear
61
+ - num_epochs: 1
62
 
63
  ### Training results
64
 
65
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
66
  |:-------------:|:-----:|:----:|:---------------:|:--------:|
67
+ | 0.1744 | 1.0 | 4000 | 0.2001 | 0.938 |
 
 
68
 
69
 
70
  ### Framework versions
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "albert-base-v2",
3
  "architectures": [
4
  "AlbertForSequenceClassification"
5
  ],
 
1
  {
2
+ "_name_or_path": "lilyray/albert_emotion",
3
  "architectures": [
4
  "AlbertForSequenceClassification"
5
  ],
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a3bd7be388a3134ff3089b77cb56bbf7183b56379d3247445a3c02cb5edb4ccb
3
  size 46756216
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c9f4ca220fb8c8f18ad50a94a4835cce04a83390d626d73001af82f099061c9
3
  size 46756216
run-0/checkpoint-4000/config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "lilyray/albert_emotion",
3
+ "architectures": [
4
+ "AlbertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0,
7
+ "bos_token_id": 2,
8
+ "classifier_dropout_prob": 0.1,
9
+ "down_scale_factor": 1,
10
+ "embedding_size": 128,
11
+ "eos_token_id": 3,
12
+ "gap_size": 0,
13
+ "hidden_act": "gelu_new",
14
+ "hidden_dropout_prob": 0,
15
+ "hidden_size": 768,
16
+ "id2label": {
17
+ "0": "LABEL_0",
18
+ "1": "LABEL_1",
19
+ "2": "LABEL_2",
20
+ "3": "LABEL_3",
21
+ "4": "LABEL_4",
22
+ "5": "LABEL_5"
23
+ },
24
+ "initializer_range": 0.02,
25
+ "inner_group_num": 1,
26
+ "intermediate_size": 3072,
27
+ "label2id": {
28
+ "LABEL_0": 0,
29
+ "LABEL_1": 1,
30
+ "LABEL_2": 2,
31
+ "LABEL_3": 3,
32
+ "LABEL_4": 4,
33
+ "LABEL_5": 5
34
+ },
35
+ "layer_norm_eps": 1e-12,
36
+ "max_position_embeddings": 512,
37
+ "model_type": "albert",
38
+ "net_structure_type": 0,
39
+ "num_attention_heads": 12,
40
+ "num_hidden_groups": 1,
41
+ "num_hidden_layers": 12,
42
+ "num_memory_blocks": 0,
43
+ "pad_token_id": 0,
44
+ "position_embedding_type": "absolute",
45
+ "problem_type": "single_label_classification",
46
+ "torch_dtype": "float32",
47
+ "transformers_version": "4.38.2",
48
+ "type_vocab_size": 2,
49
+ "vocab_size": 30000
50
+ }
run-0/checkpoint-4000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c9f4ca220fb8c8f18ad50a94a4835cce04a83390d626d73001af82f099061c9
3
+ size 46756216
run-0/checkpoint-4000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04baaddef98b9b3d10f7af173f5da9ff1ee547ec109053003b96cfd7897f02ae
3
+ size 93528589
run-0/checkpoint-4000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec7a852da27c217b438bad7e3dc6e4654f3e185131a6f1c6fdced0f575980260
3
+ size 14244
run-0/checkpoint-4000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f9ad246437bca3da1f271ab4e8c9d6670a66674291e9393e21a05eefa2c3667
3
+ size 1064
run-0/checkpoint-4000/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": {
6
+ "content": "[MASK]",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "[SEP]",
14
+ "unk_token": "<unk>"
15
+ }
run-0/checkpoint-4000/spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fefb02b667a6c5c2fe27602d28e5fb3428f66ab89c7d6f388e7c8d44a02d0336
3
+ size 760289
run-0/checkpoint-4000/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<unk>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": true,
48
+ "eos_token": "[SEP]",
49
+ "keep_accents": false,
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 512,
52
+ "pad_token": "<pad>",
53
+ "remove_space": true,
54
+ "sep_token": "[SEP]",
55
+ "sp_model_kwargs": {},
56
+ "tokenizer_class": "AlbertTokenizer",
57
+ "unk_token": "<unk>"
58
+ }
run-0/checkpoint-4000/trainer_state.json ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.20014449954032898,
3
+ "best_model_checkpoint": "./albert_emotion/run-0/checkpoint-4000",
4
+ "epoch": 1.0,
5
+ "eval_steps": 500,
6
+ "global_step": 4000,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.12,
13
+ "grad_norm": 2.5895392894744873,
14
+ "learning_rate": 8.193150077087784e-06,
15
+ "loss": 0.1782,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.25,
20
+ "grad_norm": 0.041996244341135025,
21
+ "learning_rate": 7.022700066075244e-06,
22
+ "loss": 0.2179,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.38,
27
+ "grad_norm": 0.02129148505628109,
28
+ "learning_rate": 5.852250055062703e-06,
29
+ "loss": 0.2007,
30
+ "step": 1500
31
+ },
32
+ {
33
+ "epoch": 0.5,
34
+ "grad_norm": 0.011637452058494091,
35
+ "learning_rate": 4.6818000440501625e-06,
36
+ "loss": 0.1845,
37
+ "step": 2000
38
+ },
39
+ {
40
+ "epoch": 0.62,
41
+ "grad_norm": 0.7839567065238953,
42
+ "learning_rate": 3.511350033037622e-06,
43
+ "loss": 0.1674,
44
+ "step": 2500
45
+ },
46
+ {
47
+ "epoch": 0.75,
48
+ "grad_norm": 21.355159759521484,
49
+ "learning_rate": 2.3409000220250813e-06,
50
+ "loss": 0.1903,
51
+ "step": 3000
52
+ },
53
+ {
54
+ "epoch": 0.88,
55
+ "grad_norm": 0.0020764770451933146,
56
+ "learning_rate": 1.1704500110125406e-06,
57
+ "loss": 0.1459,
58
+ "step": 3500
59
+ },
60
+ {
61
+ "epoch": 1.0,
62
+ "grad_norm": 0.024950213730335236,
63
+ "learning_rate": 0.0,
64
+ "loss": 0.1744,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 1.0,
69
+ "eval_accuracy": 0.938,
70
+ "eval_loss": 0.20014449954032898,
71
+ "eval_runtime": 23.7683,
72
+ "eval_samples_per_second": 84.146,
73
+ "eval_steps_per_second": 10.518,
74
+ "step": 4000
75
+ }
76
+ ],
77
+ "logging_steps": 500,
78
+ "max_steps": 4000,
79
+ "num_input_tokens_seen": 0,
80
+ "num_train_epochs": 1,
81
+ "save_steps": 500,
82
+ "total_flos": 382520819712000.0,
83
+ "train_batch_size": 4,
84
+ "trial_name": null,
85
+ "trial_params": {
86
+ "learning_rate": 9.363600088100325e-06,
87
+ "num_train_epochs": 1,
88
+ "per_device_train_batch_size": 4,
89
+ "seed": 19
90
+ }
91
+ }
run-0/checkpoint-4000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc54fb364a92da07bae7429ee017efe39a94ff6dde12fc714abceb63950444f4
3
+ size 4920
run-1/checkpoint-12000/config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "lilyray/albert_emotion",
3
+ "architectures": [
4
+ "AlbertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0,
7
+ "bos_token_id": 2,
8
+ "classifier_dropout_prob": 0.1,
9
+ "down_scale_factor": 1,
10
+ "embedding_size": 128,
11
+ "eos_token_id": 3,
12
+ "gap_size": 0,
13
+ "hidden_act": "gelu_new",
14
+ "hidden_dropout_prob": 0,
15
+ "hidden_size": 768,
16
+ "id2label": {
17
+ "0": "LABEL_0",
18
+ "1": "LABEL_1",
19
+ "2": "LABEL_2",
20
+ "3": "LABEL_3",
21
+ "4": "LABEL_4",
22
+ "5": "LABEL_5"
23
+ },
24
+ "initializer_range": 0.02,
25
+ "inner_group_num": 1,
26
+ "intermediate_size": 3072,
27
+ "label2id": {
28
+ "LABEL_0": 0,
29
+ "LABEL_1": 1,
30
+ "LABEL_2": 2,
31
+ "LABEL_3": 3,
32
+ "LABEL_4": 4,
33
+ "LABEL_5": 5
34
+ },
35
+ "layer_norm_eps": 1e-12,
36
+ "max_position_embeddings": 512,
37
+ "model_type": "albert",
38
+ "net_structure_type": 0,
39
+ "num_attention_heads": 12,
40
+ "num_hidden_groups": 1,
41
+ "num_hidden_layers": 12,
42
+ "num_memory_blocks": 0,
43
+ "pad_token_id": 0,
44
+ "position_embedding_type": "absolute",
45
+ "problem_type": "single_label_classification",
46
+ "torch_dtype": "float32",
47
+ "transformers_version": "4.38.2",
48
+ "type_vocab_size": 2,
49
+ "vocab_size": 30000
50
+ }
run-1/checkpoint-12000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d74e7e217204e16e12d1470df069950545a7c8e6d11c2125ef6ac811356e71e6
3
+ size 46756216
run-1/checkpoint-12000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd6db33e57cdc795f7370fbbc9657ead92277c1830798eb20e0916ad06830536
3
+ size 93528589
run-1/checkpoint-12000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76facf651161fc3ecb4108718251c45d084d225bbcad2b6d151a8274c5618459
3
+ size 14244
run-1/checkpoint-12000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b178e4c67824c76c5bfa9fa17c1cb191754d2a695b44af06c37ebd566a5277e
3
+ size 1064
run-1/checkpoint-12000/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": {
6
+ "content": "[MASK]",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "[SEP]",
14
+ "unk_token": "<unk>"
15
+ }
run-1/checkpoint-12000/spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fefb02b667a6c5c2fe27602d28e5fb3428f66ab89c7d6f388e7c8d44a02d0336
3
+ size 760289
run-1/checkpoint-12000/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<unk>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": true,
48
+ "eos_token": "[SEP]",
49
+ "keep_accents": false,
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 512,
52
+ "pad_token": "<pad>",
53
+ "remove_space": true,
54
+ "sep_token": "[SEP]",
55
+ "sp_model_kwargs": {},
56
+ "tokenizer_class": "AlbertTokenizer",
57
+ "unk_token": "<unk>"
58
+ }
run-1/checkpoint-12000/trainer_state.json ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.24552112817764282,
3
+ "best_model_checkpoint": "./albert_emotion/run-1/checkpoint-4000",
4
+ "epoch": 3.0,
5
+ "eval_steps": 500,
6
+ "global_step": 12000,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.12,
13
+ "grad_norm": 0.027287261560559273,
14
+ "learning_rate": 1.1966787450728162e-06,
15
+ "loss": 0.0943,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.25,
20
+ "grad_norm": 0.007144883740693331,
21
+ "learning_rate": 1.158076204909177e-06,
22
+ "loss": 0.1192,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.38,
27
+ "grad_norm": 0.017046278342604637,
28
+ "learning_rate": 1.1194736647455377e-06,
29
+ "loss": 0.1077,
30
+ "step": 1500
31
+ },
32
+ {
33
+ "epoch": 0.5,
34
+ "grad_norm": 20.33992576599121,
35
+ "learning_rate": 1.0808711245818985e-06,
36
+ "loss": 0.145,
37
+ "step": 2000
38
+ },
39
+ {
40
+ "epoch": 0.62,
41
+ "grad_norm": 0.004928311333060265,
42
+ "learning_rate": 1.0422685844182593e-06,
43
+ "loss": 0.1311,
44
+ "step": 2500
45
+ },
46
+ {
47
+ "epoch": 0.75,
48
+ "grad_norm": 0.009308135136961937,
49
+ "learning_rate": 1.00366604425462e-06,
50
+ "loss": 0.1226,
51
+ "step": 3000
52
+ },
53
+ {
54
+ "epoch": 0.88,
55
+ "grad_norm": 0.002149054082110524,
56
+ "learning_rate": 9.650635040909807e-07,
57
+ "loss": 0.1444,
58
+ "step": 3500
59
+ },
60
+ {
61
+ "epoch": 1.0,
62
+ "grad_norm": 0.0062008146196603775,
63
+ "learning_rate": 9.264609639273415e-07,
64
+ "loss": 0.1587,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 1.0,
69
+ "eval_accuracy": 0.937,
70
+ "eval_loss": 0.24552112817764282,
71
+ "eval_runtime": 23.8526,
72
+ "eval_samples_per_second": 83.848,
73
+ "eval_steps_per_second": 10.481,
74
+ "step": 4000
75
+ },
76
+ {
77
+ "epoch": 1.12,
78
+ "grad_norm": 22.703540802001953,
79
+ "learning_rate": 8.878584237637023e-07,
80
+ "loss": 0.1139,
81
+ "step": 4500
82
+ },
83
+ {
84
+ "epoch": 1.25,
85
+ "grad_norm": 0.018092291429638863,
86
+ "learning_rate": 8.492558836000631e-07,
87
+ "loss": 0.1361,
88
+ "step": 5000
89
+ },
90
+ {
91
+ "epoch": 1.38,
92
+ "grad_norm": 0.011323424987494946,
93
+ "learning_rate": 8.106533434364239e-07,
94
+ "loss": 0.123,
95
+ "step": 5500
96
+ },
97
+ {
98
+ "epoch": 1.5,
99
+ "grad_norm": 0.003476408077403903,
100
+ "learning_rate": 7.720508032727847e-07,
101
+ "loss": 0.12,
102
+ "step": 6000
103
+ },
104
+ {
105
+ "epoch": 1.62,
106
+ "grad_norm": 0.026127604767680168,
107
+ "learning_rate": 7.334482631091454e-07,
108
+ "loss": 0.1196,
109
+ "step": 6500
110
+ },
111
+ {
112
+ "epoch": 1.75,
113
+ "grad_norm": 32.6096076965332,
114
+ "learning_rate": 6.948457229455062e-07,
115
+ "loss": 0.1199,
116
+ "step": 7000
117
+ },
118
+ {
119
+ "epoch": 1.88,
120
+ "grad_norm": 0.0031733482610434294,
121
+ "learning_rate": 6.562431827818669e-07,
122
+ "loss": 0.1074,
123
+ "step": 7500
124
+ },
125
+ {
126
+ "epoch": 2.0,
127
+ "grad_norm": 0.014037689194083214,
128
+ "learning_rate": 6.176406426182277e-07,
129
+ "loss": 0.1336,
130
+ "step": 8000
131
+ },
132
+ {
133
+ "epoch": 2.0,
134
+ "eval_accuracy": 0.9335,
135
+ "eval_loss": 0.2784283757209778,
136
+ "eval_runtime": 23.8236,
137
+ "eval_samples_per_second": 83.95,
138
+ "eval_steps_per_second": 10.494,
139
+ "step": 8000
140
+ },
141
+ {
142
+ "epoch": 2.12,
143
+ "grad_norm": 0.0019384464249014854,
144
+ "learning_rate": 5.790381024545885e-07,
145
+ "loss": 0.1199,
146
+ "step": 8500
147
+ },
148
+ {
149
+ "epoch": 2.25,
150
+ "grad_norm": 0.007850521244108677,
151
+ "learning_rate": 5.404355622909492e-07,
152
+ "loss": 0.1195,
153
+ "step": 9000
154
+ },
155
+ {
156
+ "epoch": 2.38,
157
+ "grad_norm": 0.3905338943004608,
158
+ "learning_rate": 5.0183302212731e-07,
159
+ "loss": 0.1143,
160
+ "step": 9500
161
+ },
162
+ {
163
+ "epoch": 2.5,
164
+ "grad_norm": 0.0038829813711345196,
165
+ "learning_rate": 4.6323048196367076e-07,
166
+ "loss": 0.1005,
167
+ "step": 10000
168
+ },
169
+ {
170
+ "epoch": 2.62,
171
+ "grad_norm": 0.0025081851053982973,
172
+ "learning_rate": 4.2462794180003157e-07,
173
+ "loss": 0.1006,
174
+ "step": 10500
175
+ },
176
+ {
177
+ "epoch": 2.75,
178
+ "grad_norm": 0.0029693343676626682,
179
+ "learning_rate": 3.8602540163639233e-07,
180
+ "loss": 0.1055,
181
+ "step": 11000
182
+ },
183
+ {
184
+ "epoch": 2.88,
185
+ "grad_norm": 0.0075212884694337845,
186
+ "learning_rate": 3.474228614727531e-07,
187
+ "loss": 0.0892,
188
+ "step": 11500
189
+ },
190
+ {
191
+ "epoch": 3.0,
192
+ "grad_norm": 0.01758272759616375,
193
+ "learning_rate": 3.0882032130911386e-07,
194
+ "loss": 0.1181,
195
+ "step": 12000
196
+ },
197
+ {
198
+ "epoch": 3.0,
199
+ "eval_accuracy": 0.934,
200
+ "eval_loss": 0.29778438806533813,
201
+ "eval_runtime": 23.8749,
202
+ "eval_samples_per_second": 83.77,
203
+ "eval_steps_per_second": 10.471,
204
+ "step": 12000
205
+ }
206
+ ],
207
+ "logging_steps": 500,
208
+ "max_steps": 16000,
209
+ "num_input_tokens_seen": 0,
210
+ "num_train_epochs": 4,
211
+ "save_steps": 500,
212
+ "total_flos": 1147562459136000.0,
213
+ "train_batch_size": 4,
214
+ "trial_name": null,
215
+ "trial_params": {
216
+ "learning_rate": 1.2352812852364554e-06,
217
+ "num_train_epochs": 4,
218
+ "per_device_train_batch_size": 4,
219
+ "seed": 18
220
+ }
221
+ }
run-1/checkpoint-12000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e21bcd66d0c63173564964a8f0cb271364f443120673a2d61a746bc1cac25c2f
3
+ size 4920
run-1/checkpoint-16000/config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "lilyray/albert_emotion",
3
+ "architectures": [
4
+ "AlbertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0,
7
+ "bos_token_id": 2,
8
+ "classifier_dropout_prob": 0.1,
9
+ "down_scale_factor": 1,
10
+ "embedding_size": 128,
11
+ "eos_token_id": 3,
12
+ "gap_size": 0,
13
+ "hidden_act": "gelu_new",
14
+ "hidden_dropout_prob": 0,
15
+ "hidden_size": 768,
16
+ "id2label": {
17
+ "0": "LABEL_0",
18
+ "1": "LABEL_1",
19
+ "2": "LABEL_2",
20
+ "3": "LABEL_3",
21
+ "4": "LABEL_4",
22
+ "5": "LABEL_5"
23
+ },
24
+ "initializer_range": 0.02,
25
+ "inner_group_num": 1,
26
+ "intermediate_size": 3072,
27
+ "label2id": {
28
+ "LABEL_0": 0,
29
+ "LABEL_1": 1,
30
+ "LABEL_2": 2,
31
+ "LABEL_3": 3,
32
+ "LABEL_4": 4,
33
+ "LABEL_5": 5
34
+ },
35
+ "layer_norm_eps": 1e-12,
36
+ "max_position_embeddings": 512,
37
+ "model_type": "albert",
38
+ "net_structure_type": 0,
39
+ "num_attention_heads": 12,
40
+ "num_hidden_groups": 1,
41
+ "num_hidden_layers": 12,
42
+ "num_memory_blocks": 0,
43
+ "pad_token_id": 0,
44
+ "position_embedding_type": "absolute",
45
+ "problem_type": "single_label_classification",
46
+ "torch_dtype": "float32",
47
+ "transformers_version": "4.38.2",
48
+ "type_vocab_size": 2,
49
+ "vocab_size": 30000
50
+ }
run-1/checkpoint-16000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e6f0627e728caaff09e615dd0c91be7b23e6b74b1b84009e1b5df84057bed1f
3
+ size 46756216
run-1/checkpoint-16000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d761d19c54c268cfe5c7d62cd792066265268e25f4cf4dcdbeb27de274299ef
3
+ size 93528589
run-1/checkpoint-16000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4e09177cb571c6fc30997814a2cc2e51b7c69366f36ffb4f05beb9832aec029
3
+ size 14244
run-1/checkpoint-16000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b72072762b71d361472d5bd69dc1c822ad8870128e9f33e92009a7fafeca88c
3
+ size 1064
run-1/checkpoint-16000/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": {
6
+ "content": "[MASK]",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "[SEP]",
14
+ "unk_token": "<unk>"
15
+ }
run-1/checkpoint-16000/spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fefb02b667a6c5c2fe27602d28e5fb3428f66ab89c7d6f388e7c8d44a02d0336
3
+ size 760289
run-1/checkpoint-16000/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<unk>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": true,
48
+ "eos_token": "[SEP]",
49
+ "keep_accents": false,
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 512,
52
+ "pad_token": "<pad>",
53
+ "remove_space": true,
54
+ "sep_token": "[SEP]",
55
+ "sp_model_kwargs": {},
56
+ "tokenizer_class": "AlbertTokenizer",
57
+ "unk_token": "<unk>"
58
+ }
run-1/checkpoint-16000/trainer_state.json ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.24552112817764282,
3
+ "best_model_checkpoint": "./albert_emotion/run-1/checkpoint-4000",
4
+ "epoch": 4.0,
5
+ "eval_steps": 500,
6
+ "global_step": 16000,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.12,
13
+ "grad_norm": 0.027287261560559273,
14
+ "learning_rate": 1.1966787450728162e-06,
15
+ "loss": 0.0943,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.25,
20
+ "grad_norm": 0.007144883740693331,
21
+ "learning_rate": 1.158076204909177e-06,
22
+ "loss": 0.1192,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.38,
27
+ "grad_norm": 0.017046278342604637,
28
+ "learning_rate": 1.1194736647455377e-06,
29
+ "loss": 0.1077,
30
+ "step": 1500
31
+ },
32
+ {
33
+ "epoch": 0.5,
34
+ "grad_norm": 20.33992576599121,
35
+ "learning_rate": 1.0808711245818985e-06,
36
+ "loss": 0.145,
37
+ "step": 2000
38
+ },
39
+ {
40
+ "epoch": 0.62,
41
+ "grad_norm": 0.004928311333060265,
42
+ "learning_rate": 1.0422685844182593e-06,
43
+ "loss": 0.1311,
44
+ "step": 2500
45
+ },
46
+ {
47
+ "epoch": 0.75,
48
+ "grad_norm": 0.009308135136961937,
49
+ "learning_rate": 1.00366604425462e-06,
50
+ "loss": 0.1226,
51
+ "step": 3000
52
+ },
53
+ {
54
+ "epoch": 0.88,
55
+ "grad_norm": 0.002149054082110524,
56
+ "learning_rate": 9.650635040909807e-07,
57
+ "loss": 0.1444,
58
+ "step": 3500
59
+ },
60
+ {
61
+ "epoch": 1.0,
62
+ "grad_norm": 0.0062008146196603775,
63
+ "learning_rate": 9.264609639273415e-07,
64
+ "loss": 0.1587,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 1.0,
69
+ "eval_accuracy": 0.937,
70
+ "eval_loss": 0.24552112817764282,
71
+ "eval_runtime": 23.8526,
72
+ "eval_samples_per_second": 83.848,
73
+ "eval_steps_per_second": 10.481,
74
+ "step": 4000
75
+ },
76
+ {
77
+ "epoch": 1.12,
78
+ "grad_norm": 22.703540802001953,
79
+ "learning_rate": 8.878584237637023e-07,
80
+ "loss": 0.1139,
81
+ "step": 4500
82
+ },
83
+ {
84
+ "epoch": 1.25,
85
+ "grad_norm": 0.018092291429638863,
86
+ "learning_rate": 8.492558836000631e-07,
87
+ "loss": 0.1361,
88
+ "step": 5000
89
+ },
90
+ {
91
+ "epoch": 1.38,
92
+ "grad_norm": 0.011323424987494946,
93
+ "learning_rate": 8.106533434364239e-07,
94
+ "loss": 0.123,
95
+ "step": 5500
96
+ },
97
+ {
98
+ "epoch": 1.5,
99
+ "grad_norm": 0.003476408077403903,
100
+ "learning_rate": 7.720508032727847e-07,
101
+ "loss": 0.12,
102
+ "step": 6000
103
+ },
104
+ {
105
+ "epoch": 1.62,
106
+ "grad_norm": 0.026127604767680168,
107
+ "learning_rate": 7.334482631091454e-07,
108
+ "loss": 0.1196,
109
+ "step": 6500
110
+ },
111
+ {
112
+ "epoch": 1.75,
113
+ "grad_norm": 32.6096076965332,
114
+ "learning_rate": 6.948457229455062e-07,
115
+ "loss": 0.1199,
116
+ "step": 7000
117
+ },
118
+ {
119
+ "epoch": 1.88,
120
+ "grad_norm": 0.0031733482610434294,
121
+ "learning_rate": 6.562431827818669e-07,
122
+ "loss": 0.1074,
123
+ "step": 7500
124
+ },
125
+ {
126
+ "epoch": 2.0,
127
+ "grad_norm": 0.014037689194083214,
128
+ "learning_rate": 6.176406426182277e-07,
129
+ "loss": 0.1336,
130
+ "step": 8000
131
+ },
132
+ {
133
+ "epoch": 2.0,
134
+ "eval_accuracy": 0.9335,
135
+ "eval_loss": 0.2784283757209778,
136
+ "eval_runtime": 23.8236,
137
+ "eval_samples_per_second": 83.95,
138
+ "eval_steps_per_second": 10.494,
139
+ "step": 8000
140
+ },
141
+ {
142
+ "epoch": 2.12,
143
+ "grad_norm": 0.0019384464249014854,
144
+ "learning_rate": 5.790381024545885e-07,
145
+ "loss": 0.1199,
146
+ "step": 8500
147
+ },
148
+ {
149
+ "epoch": 2.25,
150
+ "grad_norm": 0.007850521244108677,
151
+ "learning_rate": 5.404355622909492e-07,
152
+ "loss": 0.1195,
153
+ "step": 9000
154
+ },
155
+ {
156
+ "epoch": 2.38,
157
+ "grad_norm": 0.3905338943004608,
158
+ "learning_rate": 5.0183302212731e-07,
159
+ "loss": 0.1143,
160
+ "step": 9500
161
+ },
162
+ {
163
+ "epoch": 2.5,
164
+ "grad_norm": 0.0038829813711345196,
165
+ "learning_rate": 4.6323048196367076e-07,
166
+ "loss": 0.1005,
167
+ "step": 10000
168
+ },
169
+ {
170
+ "epoch": 2.62,
171
+ "grad_norm": 0.0025081851053982973,
172
+ "learning_rate": 4.2462794180003157e-07,
173
+ "loss": 0.1006,
174
+ "step": 10500
175
+ },
176
+ {
177
+ "epoch": 2.75,
178
+ "grad_norm": 0.0029693343676626682,
179
+ "learning_rate": 3.8602540163639233e-07,
180
+ "loss": 0.1055,
181
+ "step": 11000
182
+ },
183
+ {
184
+ "epoch": 2.88,
185
+ "grad_norm": 0.0075212884694337845,
186
+ "learning_rate": 3.474228614727531e-07,
187
+ "loss": 0.0892,
188
+ "step": 11500
189
+ },
190
+ {
191
+ "epoch": 3.0,
192
+ "grad_norm": 0.01758272759616375,
193
+ "learning_rate": 3.0882032130911386e-07,
194
+ "loss": 0.1181,
195
+ "step": 12000
196
+ },
197
+ {
198
+ "epoch": 3.0,
199
+ "eval_accuracy": 0.934,
200
+ "eval_loss": 0.29778438806533813,
201
+ "eval_runtime": 23.8749,
202
+ "eval_samples_per_second": 83.77,
203
+ "eval_steps_per_second": 10.471,
204
+ "step": 12000
205
+ },
206
+ {
207
+ "epoch": 3.12,
208
+ "grad_norm": 0.0022781568113714457,
209
+ "learning_rate": 2.702177811454746e-07,
210
+ "loss": 0.0989,
211
+ "step": 12500
212
+ },
213
+ {
214
+ "epoch": 3.25,
215
+ "grad_norm": 0.2688106596469879,
216
+ "learning_rate": 2.3161524098183538e-07,
217
+ "loss": 0.1176,
218
+ "step": 13000
219
+ },
220
+ {
221
+ "epoch": 3.38,
222
+ "grad_norm": 0.0026019506622105837,
223
+ "learning_rate": 1.9301270081819617e-07,
224
+ "loss": 0.0959,
225
+ "step": 13500
226
+ },
227
+ {
228
+ "epoch": 3.5,
229
+ "grad_norm": 121.57093048095703,
230
+ "learning_rate": 1.5441016065455693e-07,
231
+ "loss": 0.0767,
232
+ "step": 14000
233
+ },
234
+ {
235
+ "epoch": 3.62,
236
+ "grad_norm": 0.005118395667523146,
237
+ "learning_rate": 1.1580762049091769e-07,
238
+ "loss": 0.0944,
239
+ "step": 14500
240
+ },
241
+ {
242
+ "epoch": 3.75,
243
+ "grad_norm": 0.005758533254265785,
244
+ "learning_rate": 7.720508032727846e-08,
245
+ "loss": 0.0803,
246
+ "step": 15000
247
+ },
248
+ {
249
+ "epoch": 3.88,
250
+ "grad_norm": 0.005111873149871826,
251
+ "learning_rate": 3.860254016363923e-08,
252
+ "loss": 0.0794,
253
+ "step": 15500
254
+ },
255
+ {
256
+ "epoch": 4.0,
257
+ "grad_norm": 0.011210494674742222,
258
+ "learning_rate": 0.0,
259
+ "loss": 0.1173,
260
+ "step": 16000
261
+ },
262
+ {
263
+ "epoch": 4.0,
264
+ "eval_accuracy": 0.932,
265
+ "eval_loss": 0.31061089038848877,
266
+ "eval_runtime": 23.8961,
267
+ "eval_samples_per_second": 83.696,
268
+ "eval_steps_per_second": 10.462,
269
+ "step": 16000
270
+ }
271
+ ],
272
+ "logging_steps": 500,
273
+ "max_steps": 16000,
274
+ "num_input_tokens_seen": 0,
275
+ "num_train_epochs": 4,
276
+ "save_steps": 500,
277
+ "total_flos": 1530083278848000.0,
278
+ "train_batch_size": 4,
279
+ "trial_name": null,
280
+ "trial_params": {
281
+ "learning_rate": 1.2352812852364554e-06,
282
+ "num_train_epochs": 4,
283
+ "per_device_train_batch_size": 4,
284
+ "seed": 18
285
+ }
286
+ }
run-1/checkpoint-16000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e21bcd66d0c63173564964a8f0cb271364f443120673a2d61a746bc1cac25c2f
3
+ size 4920
run-1/checkpoint-4000/config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "lilyray/albert_emotion",
3
+ "architectures": [
4
+ "AlbertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0,
7
+ "bos_token_id": 2,
8
+ "classifier_dropout_prob": 0.1,
9
+ "down_scale_factor": 1,
10
+ "embedding_size": 128,
11
+ "eos_token_id": 3,
12
+ "gap_size": 0,
13
+ "hidden_act": "gelu_new",
14
+ "hidden_dropout_prob": 0,
15
+ "hidden_size": 768,
16
+ "id2label": {
17
+ "0": "LABEL_0",
18
+ "1": "LABEL_1",
19
+ "2": "LABEL_2",
20
+ "3": "LABEL_3",
21
+ "4": "LABEL_4",
22
+ "5": "LABEL_5"
23
+ },
24
+ "initializer_range": 0.02,
25
+ "inner_group_num": 1,
26
+ "intermediate_size": 3072,
27
+ "label2id": {
28
+ "LABEL_0": 0,
29
+ "LABEL_1": 1,
30
+ "LABEL_2": 2,
31
+ "LABEL_3": 3,
32
+ "LABEL_4": 4,
33
+ "LABEL_5": 5
34
+ },
35
+ "layer_norm_eps": 1e-12,
36
+ "max_position_embeddings": 512,
37
+ "model_type": "albert",
38
+ "net_structure_type": 0,
39
+ "num_attention_heads": 12,
40
+ "num_hidden_groups": 1,
41
+ "num_hidden_layers": 12,
42
+ "num_memory_blocks": 0,
43
+ "pad_token_id": 0,
44
+ "position_embedding_type": "absolute",
45
+ "problem_type": "single_label_classification",
46
+ "torch_dtype": "float32",
47
+ "transformers_version": "4.38.2",
48
+ "type_vocab_size": 2,
49
+ "vocab_size": 30000
50
+ }
run-1/checkpoint-4000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:156fcaf3367f7c541456accd7fef964e69c0f4257da2fcef3d580cd52b6b8399
3
+ size 46756216
run-1/checkpoint-4000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be4c2acf3369aa065e68e4ab27b59e1920e75ac29bad862ee9ef76fae05f986f
3
+ size 93528589
run-1/checkpoint-4000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb2c6a03442be17a3db9eb719b0b8bbb044bc088befa9c99aaa534d072e20c7d
3
+ size 14244
run-1/checkpoint-4000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b63d5c23eb8f6fcf166011a4c23ed4426dc8ebe53b8492ed0deebb7a7161c72b
3
+ size 1064
run-1/checkpoint-4000/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": {
6
+ "content": "[MASK]",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "[SEP]",
14
+ "unk_token": "<unk>"
15
+ }
run-1/checkpoint-4000/spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fefb02b667a6c5c2fe27602d28e5fb3428f66ab89c7d6f388e7c8d44a02d0336
3
+ size 760289
run-1/checkpoint-4000/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<unk>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": true,
48
+ "eos_token": "[SEP]",
49
+ "keep_accents": false,
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 512,
52
+ "pad_token": "<pad>",
53
+ "remove_space": true,
54
+ "sep_token": "[SEP]",
55
+ "sp_model_kwargs": {},
56
+ "tokenizer_class": "AlbertTokenizer",
57
+ "unk_token": "<unk>"
58
+ }
run-1/checkpoint-4000/trainer_state.json ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.24552112817764282,
3
+ "best_model_checkpoint": "./albert_emotion/run-1/checkpoint-4000",
4
+ "epoch": 1.0,
5
+ "eval_steps": 500,
6
+ "global_step": 4000,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.12,
13
+ "grad_norm": 0.027287261560559273,
14
+ "learning_rate": 1.1966787450728162e-06,
15
+ "loss": 0.0943,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.25,
20
+ "grad_norm": 0.007144883740693331,
21
+ "learning_rate": 1.158076204909177e-06,
22
+ "loss": 0.1192,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.38,
27
+ "grad_norm": 0.017046278342604637,
28
+ "learning_rate": 1.1194736647455377e-06,
29
+ "loss": 0.1077,
30
+ "step": 1500
31
+ },
32
+ {
33
+ "epoch": 0.5,
34
+ "grad_norm": 20.33992576599121,
35
+ "learning_rate": 1.0808711245818985e-06,
36
+ "loss": 0.145,
37
+ "step": 2000
38
+ },
39
+ {
40
+ "epoch": 0.62,
41
+ "grad_norm": 0.004928311333060265,
42
+ "learning_rate": 1.0422685844182593e-06,
43
+ "loss": 0.1311,
44
+ "step": 2500
45
+ },
46
+ {
47
+ "epoch": 0.75,
48
+ "grad_norm": 0.009308135136961937,
49
+ "learning_rate": 1.00366604425462e-06,
50
+ "loss": 0.1226,
51
+ "step": 3000
52
+ },
53
+ {
54
+ "epoch": 0.88,
55
+ "grad_norm": 0.002149054082110524,
56
+ "learning_rate": 9.650635040909807e-07,
57
+ "loss": 0.1444,
58
+ "step": 3500
59
+ },
60
+ {
61
+ "epoch": 1.0,
62
+ "grad_norm": 0.0062008146196603775,
63
+ "learning_rate": 9.264609639273415e-07,
64
+ "loss": 0.1587,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 1.0,
69
+ "eval_accuracy": 0.937,
70
+ "eval_loss": 0.24552112817764282,
71
+ "eval_runtime": 23.8526,
72
+ "eval_samples_per_second": 83.848,
73
+ "eval_steps_per_second": 10.481,
74
+ "step": 4000
75
+ }
76
+ ],
77
+ "logging_steps": 500,
78
+ "max_steps": 16000,
79
+ "num_input_tokens_seen": 0,
80
+ "num_train_epochs": 4,
81
+ "save_steps": 500,
82
+ "total_flos": 382520819712000.0,
83
+ "train_batch_size": 4,
84
+ "trial_name": null,
85
+ "trial_params": {
86
+ "learning_rate": 1.2352812852364554e-06,
87
+ "num_train_epochs": 4,
88
+ "per_device_train_batch_size": 4,
89
+ "seed": 18
90
+ }
91
+ }
run-1/checkpoint-4000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e21bcd66d0c63173564964a8f0cb271364f443120673a2d61a746bc1cac25c2f
3
+ size 4920
run-1/checkpoint-8000/config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "lilyray/albert_emotion",
3
+ "architectures": [
4
+ "AlbertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0,
7
+ "bos_token_id": 2,
8
+ "classifier_dropout_prob": 0.1,
9
+ "down_scale_factor": 1,
10
+ "embedding_size": 128,
11
+ "eos_token_id": 3,
12
+ "gap_size": 0,
13
+ "hidden_act": "gelu_new",
14
+ "hidden_dropout_prob": 0,
15
+ "hidden_size": 768,
16
+ "id2label": {
17
+ "0": "LABEL_0",
18
+ "1": "LABEL_1",
19
+ "2": "LABEL_2",
20
+ "3": "LABEL_3",
21
+ "4": "LABEL_4",
22
+ "5": "LABEL_5"
23
+ },
24
+ "initializer_range": 0.02,
25
+ "inner_group_num": 1,
26
+ "intermediate_size": 3072,
27
+ "label2id": {
28
+ "LABEL_0": 0,
29
+ "LABEL_1": 1,
30
+ "LABEL_2": 2,
31
+ "LABEL_3": 3,
32
+ "LABEL_4": 4,
33
+ "LABEL_5": 5
34
+ },
35
+ "layer_norm_eps": 1e-12,
36
+ "max_position_embeddings": 512,
37
+ "model_type": "albert",
38
+ "net_structure_type": 0,
39
+ "num_attention_heads": 12,
40
+ "num_hidden_groups": 1,
41
+ "num_hidden_layers": 12,
42
+ "num_memory_blocks": 0,
43
+ "pad_token_id": 0,
44
+ "position_embedding_type": "absolute",
45
+ "problem_type": "single_label_classification",
46
+ "torch_dtype": "float32",
47
+ "transformers_version": "4.38.2",
48
+ "type_vocab_size": 2,
49
+ "vocab_size": 30000
50
+ }
run-1/checkpoint-8000/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:068474bcd6bb43ca5976ef5255229e83f258e79232f717aaa948eb613cc053f9
3
+ size 46756216
run-1/checkpoint-8000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b377e267df21a9617e41ec2f5dafe299ed22741e515eb243ef412039f2bbfba4
3
+ size 93528589
run-1/checkpoint-8000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f536c4fc51ba22c8463ac798baed56358909f7916ff76c65faf7c7ab5fae3b7e
3
+ size 14244
run-1/checkpoint-8000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66df00838ae5a810d2c870ea68518f1032da528e2df9a2c35bd7cd9142aa7c55
3
+ size 1064
run-1/checkpoint-8000/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": {
6
+ "content": "[MASK]",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "[SEP]",
14
+ "unk_token": "<unk>"
15
+ }
run-1/checkpoint-8000/spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fefb02b667a6c5c2fe27602d28e5fb3428f66ab89c7d6f388e7c8d44a02d0336
3
+ size 760289