language: en
license: apache-2.0
tags:
- azureml
- t5
- summarization
- deepspeed
datasets:
- samsum
widget:
- text: >-
Kevin: Hey man, are you excited to watch Finding Nemo tonight?
Henry: Yea, I can't wait to watch that same movie for the 89th time. Is
Nate coming over to watch it with us tonight?
Kevin: Yep, he said he'll be arriving a bit later at around 7 since he
gets off of work at 6. Have you taken out the garbage yet? It's starting
to make the kitchen really smell.
Henry: Oh I forgot. I'll do that once I'm finished with my assignment for
my math class. I didn't get to start on it until an hour ago, and it's due
in 30 minutes.
Kevin: Okay dude, you should take it out as soon as possible. By the way,
Nate is bringing his girlfriend and their cat too.
Henry: Nice, I'm really looking forward to seeing them again.
model-index:
- name: henryu-lin/t5-large-samsum-deepspeed
results:
- task:
type: summarization
name: Summarization
dataset:
name: samsum
type: samsum
config: samsum
split: train
metrics:
- type: rouge
value: 40.8694
name: ROUGE-1
verified: true
verifyToken: >-
eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjgxNDg4YjM4YjY0MDVhZmY5ZjQ1YTgyN2RhOGIwY2M5YjljMjAwYjI0ZWEzMzMxMzBlYmE5MjY3ODM1MjI4YiIsInZlcnNpb24iOjF9.NkOSwlWC_r8ewewRk1X9KJxaTEWZ0lDz0SuABLeUf1tESeTBowSJJBXgwiYb7gjpHnipfcK2HczlNRl-KzdDAA
- type: rouge
value: 19.223
name: ROUGE-2
verified: true
verifyToken: >-
eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjc3ZTY3ZWU0OWE5Zjc1ZjdiZWE4NDQ0YzI3MTMxNTM3ZDRjY2Y1YWM1OWQyOWMwMTZlMmRlZTI5ZGNkMmI5OSIsInZlcnNpb24iOjF9.4jHtzkDGNLPHSC7RN9Hi5jeiLy9F3JwBpDKdCjkiDmZY_cgHHCTr5v6QTr7VISZNQdCNg27iO0d8ohSIxVVXBg
- type: rouge
value: 31.0688
name: ROUGE-L
verified: true
verifyToken: >-
eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNmQzZjhhMjJjZWFkYWViODlhMDQ4MTY0YWI4NzRjZWMyMGZlMDQ1MjA0Yzk0MTczOWU0ODMyYjQ3NGEwOTZhNiIsInZlcnNpb24iOjF9.nUPHLaP5n_7YYbtV6ms0-fOGtPvEx826Ivsv-MfKiUVKyxTJ-9G_xbECK2cS1XQxuO05tlWhO89zz03vsNkuAQ
- type: rouge
value: 38.3786
name: ROUGE-LSUM
verified: true
verifyToken: >-
eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTJlMGUwMTE4NWRkMWZkYzU0NzA3ODQzOWE2MWY3MWU2YmFjNzQ0ZTU2MDZiYTY0ZmY2N2U4NmUyOTY5NDRkMiIsInZlcnNpb24iOjF9.-T68JCuA99EVzu4fIOJN-Vyu-d__RYvfnKPaLu4pJ2cOmRVKh2Qc6pHnjXDP2powPu2R6pD6KcANZhEE4AEVAw
- type: loss
value: 2.184831380844116
name: loss
verified: true
verifyToken: >-
eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTJkZjk3Y2YwM2E4ZDQxYzZkOGNkYzY3N2I4NzkyZmU4ZjYyYjk1Y2FiZjRkN2I1MTEzZmI4Y2FjNjBiYWNjZiIsInZlcnNpb24iOjF9.7OgmxB2mQ7CYH9p9p56bf7cAjkA6YflzB75zd0-O1WYgrsEibX-Zb2H6-0SMqxD-drWrRrEpma1Tu1fWSkBhDQ
- type: gen_len
value: 42.2081
name: gen_len
verified: true
verifyToken: >-
eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMmEyZmYyOGVmN2JlYzM2MzI4YzU3ZDhjYmYwZDlhZTliNzk0ODVjNjU5MGNlMmRjOGZiOTk0MGU1YWM0NDcxMCIsInZlcnNpb24iOjF9.b9-F5AzXERN-pVcH61r23kaqdKO4iX79mQPRnoZ_riZ91o6UihsNftdGa50vgleloGDwkKT4aR6PNMZujCRZDQ
t5-large-samsum-deepspeed
This model was trained using Microsoft's AzureML
and DeepSpeed
's ZeRO 2 optimization. It was fine-tuned on the SAMSum
corpus from t5-large
checkpoint.
More information on the fine-tuning process (includes samples and benchmarks):
(currently still WIP, major updates coming soon: 7/6/21~7/9/21)
Resource Usage
These results are retrieved from AzureML Studio's resource monitoring module. All experiments were ran on AzureML's low priority clusters.
key | value |
---|---|
AzureML SKU | ND40rs_v2 (8 X V100 32GB) |
Region | US West 2 |
Run Duration | 12m 47.13s |
Compute Cost (LowPriority/Dedicated) | $0.94/$4.69 (USD) |
Average CPU Utilization | 51.2% |
Average GPU Utilization | 42.0% |
GPU Memory Usage (Avg/Peak) | 24.85/28.79 (GB) |
Total GPU Energy Usage | 670.38 (kJ) |
*Compute cost is calculated from run duration and SKU's price per hour. Updated SKU pricing could be found here: https://azure.microsoft.com/en-us/pricing/details/machine-learning/
*Peak memory usage is calculated from average peak across all utilized GPUs.
Carbon Emissions
These results are obtained using codecarbon
. The carbon emission is estimated from training runtime only (excluding setup and evaluation runtime).
CodeCarbon: https://github.com/mlco2/codecarbon
key | value |
---|---|
timestamp | 2021-07-08T06:29:27 |
duration | 515.5018835067749 |
emissions | 0.043562840982919106 |
energy_consumed | 0.14638051405550773 |
country_name | USA |
region | Washington |
cloud_provider | azure |
cloud_region | westus2 |
Hyperparameters
fp16: True
per device batch size: 8
effective batch size: 64
epoch: 3.0
learning rate: 1e-4
weight decay: 0.1
seed: 1
*Same per device batch size
for evaluations
DeepSpeed
Optimizer = AdamW
, Scheduler = WarmupDecayLR
, Offload = none
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 1300000000,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 1300000000,
"contiguous_gradients": true
}
Usage
from transformers import pipeline
summarizer = pipeline("summarization", model="henryu-lin/t5-large-samsum-deepspeed")
conversation = '''Kevin: Hey man, are you excited to watch Finding Nemo tonight?
Henry: Yea, I can't wait to watch that same movie for the 89th time. Is Nate coming over to watch it with us tonight?
Kevin: Yep, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class. I didn't get to start on it until an hour ago, and it's due in 30 minutes.
Kevin: Okay dude, you should take it out as soon as possible. By the way, Nate is bringing his girlfriend and their cat too.
Henry: Nice, I'm really looking forward to seeing them again.
'''
summarizer(conversation)
Results
ROUGE | Score |
---|---|
eval_rouge1 | 53.0823 |
eval_rouge2 | 28.7097 |
eval_rougeL | 43.939 |
eval_rougeLsum | 49.067 |
predict_rouge1 | 51.6716 |
predict_rouge2 | 26.5372 |
predict_rougeL | 42.9681 |
predict_rougeLsum | 47.4084 |
Metric | Value |
---|---|
eval_gen_len | 26.4071 |
predict_gen_len | 25.9451 |
train_loss | 1.3212629926497115 |
eval_loss | 1.23828125 |
predict_loss | 1.2333984375 |
train_runtime | 515.2198 |
train_samples | 14732 |
train_samples_per_second | 85.781 |
train_steps_per_second | 1.345 |
eval_runtime | 61.275 |
eval_samples | 818 |
eval_samples_per_second | 13.35 |
eval_steps_per_second | 0.212 |
predict_runtime | 63.3732 |
predict_samples | 819 |
predict_samples_per_second | 12.923 |
predict_steps_per_second | 0.205 |
total_steps | 693 |
total_flos | 7.20140924616704e+16 |