pszemraj commited on
Commit
672f460
1 Parent(s): c4dd631

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -18
README.md CHANGED
@@ -48,15 +48,19 @@ model-index:
48
 
49
  # flan-t5-large-stacked-samsum-1024
50
 
 
 
 
 
51
  This model is a fine-tuned version of [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) on the `stacked-summaries/stacked-samsum-1024` dataset.
52
 
53
  It achieves the following results on the evaluation set:
54
- - Loss: 2.1311
55
- - Rouge1: 58.1114
56
- - Rouge2: 29.339
57
- - Rougel: 44.7611
58
- - Rougelsum: 54.2823
59
- - Gen Len: 122.364
60
 
61
  ## Model description
62
 
@@ -65,37 +69,38 @@ More information needed
65
  ## Intended uses & limitations
66
 
67
  - max input/output is 1024 tokens
68
- - this is mostly a test because `samsum` is not exactly the best dataset for general purpose summarization
69
 
70
  ## Training and evaluation data
71
 
72
- More information needed
73
 
74
  ## Training procedure
75
 
76
  ### Training hyperparameters
77
 
78
  The following hyperparameters were used during training:
79
- - learning_rate: 0.0006
80
- - train_batch_size: 4
81
- - eval_batch_size: 2
82
- - seed: 2760
83
  - distributed_type: multi-GPU
84
- - num_devices: 2
85
  - gradient_accumulation_steps: 32
86
  - total_train_batch_size: 256
87
- - total_eval_batch_size: 4
88
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
89
  - lr_scheduler_type: cosine
90
  - lr_scheduler_warmup_ratio: 0.02
91
- - num_epochs: 2.0
92
 
93
  ### Training results
94
 
95
  | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
96
  |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
97
- | 0.1734 | 1.0 | 115 | 1.8751 | 57.9286 | 29.2743 | 44.7181 | 54.2295 | 122.123 |
98
- | 0.1098 | 2.0 | 230 | 2.1311 | 58.1114 | 29.339 | 44.7611 | 54.2823 | 122.364 |
 
 
 
99
 
100
 
101
  ### Framework versions
@@ -103,4 +108,4 @@ The following hyperparameters were used during training:
103
  - Transformers 4.26.0.dev0
104
  - Pytorch 1.13.0+cu117
105
  - Datasets 2.6.1
106
- - Tokenizers 0.13.1
 
48
 
49
  # flan-t5-large-stacked-samsum-1024
50
 
51
+ <a href="https://colab.research.google.com/gist/pszemraj/a4bf61f593ebda9a8db6dc58839d9de4/brief-demo-flan-t5-stacked-samsum.ipynb">
52
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
53
+ </a>
54
+
55
  This model is a fine-tuned version of [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) on the `stacked-summaries/stacked-samsum-1024` dataset.
56
 
57
  It achieves the following results on the evaluation set:
58
+ - Loss: 2.1846
59
+ - Rouge1: 57.9637
60
+ - Rouge2: 28.7446
61
+ - Rougel: 44.3826
62
+ - Rougelsum: 54.0399
63
+ - Gen Len: 122.77
64
 
65
  ## Model description
66
 
 
69
  ## Intended uses & limitations
70
 
71
  - max input/output is 1024 tokens
72
+ - this is mostly a test because `samsum` is not exactly the best dataset for general-purpose summarization
73
 
74
  ## Training and evaluation data
75
 
76
+ See the dataset card linked on this page for info
77
 
78
  ## Training procedure
79
 
80
  ### Training hyperparameters
81
 
82
  The following hyperparameters were used during training:
83
+ - learning_rate: 0.0001
84
+ - train_batch_size: 8
85
+ - eval_batch_size: 4
86
+ - seed: 24915
87
  - distributed_type: multi-GPU
 
88
  - gradient_accumulation_steps: 32
89
  - total_train_batch_size: 256
 
90
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
91
  - lr_scheduler_type: cosine
92
  - lr_scheduler_warmup_ratio: 0.02
93
+ - num_epochs: 1.0
94
 
95
  ### Training results
96
 
97
  | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
98
  |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
99
+ | 0.1195 | 0.17 | 20 | 2.0635 | 57.8829 | 28.7887 | 44.4256 | 54.1299 | 121.8 |
100
+ | 0.1084 | 0.35 | 40 | 2.1178 | 58.0416 | 28.6487 | 44.3905 | 54.1557 | 122.893 |
101
+ | 0.1019 | 0.52 | 60 | 2.1576 | 57.816 | 28.7069 | 44.4242 | 53.9598 | 120.524 |
102
+ | 0.0975 | 0.7 | 80 | 2.1821 | 57.9597 | 28.8178 | 44.4854 | 54.068 | 121.793 |
103
+ | 0.0947 | 0.87 | 100 | 2.1846 | 57.9637 | 28.7446 | 44.3826 | 54.0399 | 122.77 |
104
 
105
 
106
  ### Framework versions
 
108
  - Transformers 4.26.0.dev0
109
  - Pytorch 1.13.0+cu117
110
  - Datasets 2.6.1
111
+ - Tokenizers 0.13.1