chansung commited on
Commit
becc4f8
1 Parent(s): 1155733

Model save

Browse files
README.md CHANGED
@@ -2,13 +2,12 @@
2
  license: gemma
3
  library_name: peft
4
  tags:
5
- - alignment-handbook
6
  - trl
7
  - sft
8
  - generated_from_trainer
9
  base_model: google/gemma-2b
10
  datasets:
11
- - llama-duo/synth_summarize_dataset_dedup
12
  model-index:
13
  - name: gemma2b-summarize-gemini1_5flash-256k
14
  results: []
@@ -19,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  # gemma2b-summarize-gemini1_5flash-256k
21
 
22
- This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the llama-duo/synth_summarize_dataset_dedup dataset.
23
  It achieves the following results on the evaluation set:
24
- - Loss: 2.5038
25
 
26
  ## Model description
27
 
@@ -45,40 +44,35 @@ The following hyperparameters were used during training:
45
  - eval_batch_size: 8
46
  - seed: 42
47
  - distributed_type: multi-GPU
48
- - num_devices: 4
49
  - gradient_accumulation_steps: 2
50
- - total_train_batch_size: 64
51
- - total_eval_batch_size: 32
52
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
  - lr_scheduler_type: cosine
54
  - lr_scheduler_warmup_ratio: 0.1
55
- - num_epochs: 15
56
 
57
  ### Training results
58
 
59
- | Training Loss | Epoch | Step | Validation Loss |
60
- |:-------------:|:-------:|:----:|:---------------:|
61
- | 1.0614 | 0.9988 | 414 | 2.4760 |
62
- | 1.0004 | 2.0 | 829 | 2.4481 |
63
- | 0.9586 | 2.9988 | 1243 | 2.4426 |
64
- | 0.9412 | 4.0 | 1658 | 2.4496 |
65
- | 0.9325 | 4.9988 | 2072 | 2.4600 |
66
- | 0.9129 | 6.0 | 2487 | 2.4629 |
67
- | 0.8995 | 6.9988 | 2901 | 2.4703 |
68
- | 0.8999 | 8.0 | 3316 | 2.4830 |
69
- | 0.8762 | 8.9988 | 3730 | 2.4934 |
70
- | 0.8821 | 10.0 | 4145 | 2.4974 |
71
- | 0.8697 | 10.9988 | 4559 | 2.5013 |
72
- | 0.8729 | 12.0 | 4974 | 2.5031 |
73
- | 0.8779 | 12.9988 | 5388 | 2.5023 |
74
- | 0.8743 | 14.0 | 5803 | 2.5033 |
75
- | 0.8746 | 14.9819 | 6210 | 2.5038 |
76
 
77
 
78
  ### Framework versions
79
 
80
  - PEFT 0.11.1
81
- - Transformers 4.40.1
82
- - Pytorch 2.2.0+cu121
83
  - Datasets 2.19.2
84
  - Tokenizers 0.19.1
 
2
  license: gemma
3
  library_name: peft
4
  tags:
 
5
  - trl
6
  - sft
7
  - generated_from_trainer
8
  base_model: google/gemma-2b
9
  datasets:
10
+ - generator
11
  model-index:
12
  - name: gemma2b-summarize-gemini1_5flash-256k
13
  results: []
 
18
 
19
  # gemma2b-summarize-gemini1_5flash-256k
20
 
21
+ This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 2.5669
24
 
25
  ## Model description
26
 
 
44
  - eval_batch_size: 8
45
  - seed: 42
46
  - distributed_type: multi-GPU
47
+ - num_devices: 8
48
  - gradient_accumulation_steps: 2
49
+ - total_train_batch_size: 128
50
+ - total_eval_batch_size: 64
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: cosine
53
  - lr_scheduler_warmup_ratio: 0.1
54
+ - num_epochs: 10
55
 
56
  ### Training results
57
 
58
+ | Training Loss | Epoch | Step | Validation Loss |
59
+ |:-------------:|:------:|:----:|:---------------:|
60
+ | 1.0246 | 0.9976 | 207 | 2.4550 |
61
+ | 0.9556 | 2.0 | 415 | 2.4530 |
62
+ | 0.9114 | 2.9976 | 622 | 2.4641 |
63
+ | 0.8927 | 4.0 | 830 | 2.4882 |
64
+ | 0.8752 | 4.9976 | 1037 | 2.5081 |
65
+ | 0.8602 | 6.0 | 1245 | 2.5277 |
66
+ | 0.8464 | 6.9976 | 1452 | 2.5513 |
67
+ | 0.8353 | 8.0 | 1660 | 2.5615 |
68
+ | 0.8267 | 8.9976 | 1867 | 2.5674 |
69
+ | 0.827 | 9.9759 | 2070 | 2.5669 |
 
 
 
 
 
70
 
71
 
72
  ### Framework versions
73
 
74
  - PEFT 0.11.1
75
+ - Transformers 4.41.2
76
+ - Pytorch 2.3.1+cu121
77
  - Datasets 2.19.2
78
  - Tokenizers 0.19.1
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b9bf333514cb70e73dda09561ebd8fba4d5658337e542a559555c3ad57cf41ab
3
  size 78480320
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:465c1b3666bad23b46105166da89ba228655adfb0e551b2809da6f6c1f2df5f3
3
  size 78480320
all_results.json CHANGED
@@ -1,14 +1,9 @@
1
  {
2
- "epoch": 14.981905910735826,
3
- "eval_loss": 2.503845453262329,
4
- "eval_runtime": 0.5015,
5
- "eval_samples": 25,
6
- "eval_samples_per_second": 19.941,
7
- "eval_steps_per_second": 1.994,
8
- "total_flos": 4.863451355047526e+18,
9
- "train_loss": 0.9534509487582098,
10
- "train_runtime": 21285.3674,
11
  "train_samples": 253412,
12
- "train_samples_per_second": 18.683,
13
- "train_steps_per_second": 0.292
14
  }
 
1
  {
2
+ "epoch": 9.975903614457831,
3
+ "total_flos": 3.290190024938619e+18,
4
+ "train_loss": 0.9333097650233099,
5
+ "train_runtime": 14306.303,
 
 
 
 
 
6
  "train_samples": 253412,
7
+ "train_samples_per_second": 18.532,
8
+ "train_steps_per_second": 0.145
9
  }
runs/Jun10_09-38-09_48ddfe8e991f/events.out.tfevents.1718012562.48ddfe8e991f.334310.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:314c1368e9577920bf9375f6e653416745d5241d15d260dd90f822bb1a8ed8e8
3
- size 92554
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35b82431d49a0c3e5f8201a73d7045835a5bd2c881a40f073d0db0fae3d9a474
3
+ size 96133
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 14.981905910735826,
3
- "total_flos": 4.863451355047526e+18,
4
- "train_loss": 0.9534509487582098,
5
- "train_runtime": 21285.3674,
6
  "train_samples": 253412,
7
- "train_samples_per_second": 18.683,
8
- "train_steps_per_second": 0.292
9
  }
 
1
  {
2
+ "epoch": 9.975903614457831,
3
+ "total_flos": 3.290190024938619e+18,
4
+ "train_loss": 0.9333097650233099,
5
+ "train_runtime": 14306.303,
6
  "train_samples": 253412,
7
+ "train_samples_per_second": 18.532,
8
+ "train_steps_per_second": 0.145
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff