autoevaluator
HF staff
Add evaluation results on the samsum config and train split of samsum
16737ee
language: en | |
license: apache-2.0 | |
tags: | |
- azureml | |
- t5 | |
- summarization | |
- deepspeed | |
datasets: | |
- samsum | |
widget: | |
- text: 'Kevin: Hey man, are you excited to watch Finding Nemo tonight? | |
Henry: Yea, I can''t wait to watch that same movie for the 89th time. Is Nate | |
coming over to watch it with us tonight? | |
Kevin: Yep, he said he''ll be arriving a bit later at around 7 since he gets off | |
of work at 6. Have you taken out the garbage yet? It''s starting to make the kitchen | |
really smell. | |
Henry: Oh I forgot. I''ll do that once I''m finished with my assignment for my | |
math class. I didn''t get to start on it until an hour ago, and it''s due in 30 | |
minutes. | |
Kevin: Okay dude, you should take it out as soon as possible. By the way, Nate | |
is bringing his girlfriend and their cat too. | |
Henry: Nice, I''m really looking forward to seeing them again.' | |
model-index: | |
- name: henryu-lin/t5-large-samsum-deepspeed | |
results: | |
- task: | |
type: summarization | |
name: Summarization | |
dataset: | |
name: samsum | |
type: samsum | |
config: samsum | |
split: train | |
metrics: | |
- type: rouge | |
value: 40.8694 | |
name: ROUGE-1 | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjgxNDg4YjM4YjY0MDVhZmY5ZjQ1YTgyN2RhOGIwY2M5YjljMjAwYjI0ZWEzMzMxMzBlYmE5MjY3ODM1MjI4YiIsInZlcnNpb24iOjF9.NkOSwlWC_r8ewewRk1X9KJxaTEWZ0lDz0SuABLeUf1tESeTBowSJJBXgwiYb7gjpHnipfcK2HczlNRl-KzdDAA | |
- type: rouge | |
value: 19.223 | |
name: ROUGE-2 | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjc3ZTY3ZWU0OWE5Zjc1ZjdiZWE4NDQ0YzI3MTMxNTM3ZDRjY2Y1YWM1OWQyOWMwMTZlMmRlZTI5ZGNkMmI5OSIsInZlcnNpb24iOjF9.4jHtzkDGNLPHSC7RN9Hi5jeiLy9F3JwBpDKdCjkiDmZY_cgHHCTr5v6QTr7VISZNQdCNg27iO0d8ohSIxVVXBg | |
- type: rouge | |
value: 31.0688 | |
name: ROUGE-L | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNmQzZjhhMjJjZWFkYWViODlhMDQ4MTY0YWI4NzRjZWMyMGZlMDQ1MjA0Yzk0MTczOWU0ODMyYjQ3NGEwOTZhNiIsInZlcnNpb24iOjF9.nUPHLaP5n_7YYbtV6ms0-fOGtPvEx826Ivsv-MfKiUVKyxTJ-9G_xbECK2cS1XQxuO05tlWhO89zz03vsNkuAQ | |
- type: rouge | |
value: 38.3786 | |
name: ROUGE-LSUM | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTJlMGUwMTE4NWRkMWZkYzU0NzA3ODQzOWE2MWY3MWU2YmFjNzQ0ZTU2MDZiYTY0ZmY2N2U4NmUyOTY5NDRkMiIsInZlcnNpb24iOjF9.-T68JCuA99EVzu4fIOJN-Vyu-d__RYvfnKPaLu4pJ2cOmRVKh2Qc6pHnjXDP2powPu2R6pD6KcANZhEE4AEVAw | |
- type: loss | |
value: 2.184831380844116 | |
name: loss | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTJkZjk3Y2YwM2E4ZDQxYzZkOGNkYzY3N2I4NzkyZmU4ZjYyYjk1Y2FiZjRkN2I1MTEzZmI4Y2FjNjBiYWNjZiIsInZlcnNpb24iOjF9.7OgmxB2mQ7CYH9p9p56bf7cAjkA6YflzB75zd0-O1WYgrsEibX-Zb2H6-0SMqxD-drWrRrEpma1Tu1fWSkBhDQ | |
- type: gen_len | |
value: 42.2081 | |
name: gen_len | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMmEyZmYyOGVmN2JlYzM2MzI4YzU3ZDhjYmYwZDlhZTliNzk0ODVjNjU5MGNlMmRjOGZiOTk0MGU1YWM0NDcxMCIsInZlcnNpb24iOjF9.b9-F5AzXERN-pVcH61r23kaqdKO4iX79mQPRnoZ_riZ91o6UihsNftdGa50vgleloGDwkKT4aR6PNMZujCRZDQ | |
## `t5-large-samsum-deepspeed` | |
This model was trained using Microsoft's `AzureML` and `DeepSpeed`'s ZeRO 2 optimization. It was fine-tuned on the `SAMSum` corpus from `t5-large` checkpoint. | |
More information on the fine-tuning process (includes samples and benchmarks): | |
*(currently still WIP, major updates coming soon: 7/6/21~7/9/21)* | |
## Resource Usage | |
These results are retrieved from AzureML Studio's resource monitoring module. All experiments were ran on AzureML's low priority clusters. | |
| key | value | | |
| --- | ----- | | |
| AzureML SKU | ND40rs_v2 (8 X V100 32GB) | | |
| Region | US West 2 | | |
| Run Duration | 12m 47.13s | | |
| Compute Cost (LowPriority/Dedicated) | $0.94/$4.69 (USD) | | |
| Average CPU Utilization | 51.2% | | |
| Average GPU Utilization | 42.0% | | |
| GPU Memory Usage (Avg/Peak) | 24.85/28.79 (GB) | | |
| Total GPU Energy Usage | 670.38 (kJ) | | |
*Compute cost is calculated from run duration and SKU's price per hour. Updated SKU pricing could be found here: https://azure.microsoft.com/en-us/pricing/details/machine-learning/ | |
*Peak memory usage is calculated from average peak across all utilized GPUs. | |
### Carbon Emissions | |
These results are obtained using `codecarbon`. The carbon emission is estimated from training runtime only (excluding setup and evaluation runtime). | |
CodeCarbon: https://github.com/mlco2/codecarbon | |
| key | value | | |
| --- | ----- | | |
| timestamp | 2021-07-08T06:29:27 | | |
| duration | 515.5018835067749 | | |
| emissions | 0.043562840982919106 | | |
| energy_consumed | 0.14638051405550773 | | |
| country_name | USA | | |
| region | Washington | | |
| cloud_provider | azure | | |
| cloud_region | westus2 | | |
## Hyperparameters | |
```yaml | |
fp16: True | |
per device batch size: 8 | |
effective batch size: 64 | |
epoch: 3.0 | |
learning rate: 1e-4 | |
weight decay: 0.1 | |
seed: 1 | |
``` | |
*Same `per device batch size` for evaluations | |
### DeepSpeed | |
Optimizer = `AdamW`, Scheduler = `WarmupDecayLR`, Offload = `none` | |
```json | |
"zero_optimization": { | |
"stage": 2, | |
"allgather_partitions": true, | |
"allgather_bucket_size": 1300000000, | |
"overlap_comm": true, | |
"reduce_scatter": true, | |
"reduce_bucket_size": 1300000000, | |
"contiguous_gradients": true | |
} | |
``` | |
## Usage | |
```python | |
from transformers import pipeline | |
summarizer = pipeline("summarization", model="henryu-lin/t5-large-samsum-deepspeed") | |
conversation = '''Kevin: Hey man, are you excited to watch Finding Nemo tonight? | |
Henry: Yea, I can't wait to watch that same movie for the 89th time. Is Nate coming over to watch it with us tonight? | |
Kevin: Yep, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell. | |
Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class. I didn't get to start on it until an hour ago, and it's due in 30 minutes. | |
Kevin: Okay dude, you should take it out as soon as possible. By the way, Nate is bringing his girlfriend and their cat too. | |
Henry: Nice, I'm really looking forward to seeing them again. | |
''' | |
summarizer(conversation) | |
``` | |
## Results | |
| ROUGE | Score | | |
| ----- | ----- | | |
| eval_rouge1 | 53.0823 | | |
| eval_rouge2 | 28.7097 | | |
| eval_rougeL | 43.939 | | |
| eval_rougeLsum | 49.067 | | |
| predict_rouge1 | 51.6716 | | |
| predict_rouge2 | 26.5372 | | |
| predict_rougeL | 42.9681 | | |
| predict_rougeLsum | 47.4084 | | |
| Metric | Value | | |
| ------ | ----- | | |
| eval_gen_len | 26.4071 | | |
| predict_gen_len | 25.9451 | | |
| train_loss | 1.3212629926497115 | | |
| eval_loss | 1.23828125 | | |
| predict_loss | 1.2333984375 | | |
| train_runtime | 515.2198 | | |
| train_samples | 14732 | | |
| train_samples_per_second | 85.781 | | |
| train_steps_per_second | 1.345 | | |
| eval_runtime | 61.275 | | |
| eval_samples | 818 | | |
| eval_samples_per_second | 13.35 | | |
| eval_steps_per_second | 0.212 | | |
| predict_runtime | 63.3732 | | |
| predict_samples | 819 | | |
| predict_samples_per_second | 12.923 | | |
| predict_steps_per_second | 0.205 | | |
| total_steps | 693 | | |
| total_flos | 7.20140924616704e+16 | | |