Add evaluation results on the samsum config and train split of samsum

16737ee about 2 years ago

7.16 kB

	---
	language: en
	license: apache-2.0
	tags:
	- azureml
	- t5
	- summarization
	- deepspeed
	datasets:
	- samsum
	widget:
	- text: 'Kevin: Hey man, are you excited to watch Finding Nemo tonight?

	Henry: Yea, I can''t wait to watch that same movie for the 89th time. Is Nate
	coming over to watch it with us tonight?

	Kevin: Yep, he said he''ll be arriving a bit later at around 7 since he gets off
	of work at 6. Have you taken out the garbage yet? It''s starting to make the kitchen
	really smell.

	Henry: Oh I forgot. I''ll do that once I''m finished with my assignment for my
	math class. I didn''t get to start on it until an hour ago, and it''s due in 30
	minutes.

	Kevin: Okay dude, you should take it out as soon as possible. By the way, Nate
	is bringing his girlfriend and their cat too.

	Henry: Nice, I''m really looking forward to seeing them again.'
	model-index:
	- name: henryu-lin/t5-large-samsum-deepspeed
	results:
	- task:
	type: summarization
	name: Summarization
	dataset:
	name: samsum
	type: samsum
	config: samsum
	split: train
	metrics:
	- type: rouge
	value: 40.8694
	name: ROUGE-1
	verified: true
	verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjgxNDg4YjM4YjY0MDVhZmY5ZjQ1YTgyN2RhOGIwY2M5YjljMjAwYjI0ZWEzMzMxMzBlYmE5MjY3ODM1MjI4YiIsInZlcnNpb24iOjF9.NkOSwlWC_r8ewewRk1X9KJxaTEWZ0lDz0SuABLeUf1tESeTBowSJJBXgwiYb7gjpHnipfcK2HczlNRl-KzdDAA
	- type: rouge
	value: 19.223
	name: ROUGE-2
	verified: true
	verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjc3ZTY3ZWU0OWE5Zjc1ZjdiZWE4NDQ0YzI3MTMxNTM3ZDRjY2Y1YWM1OWQyOWMwMTZlMmRlZTI5ZGNkMmI5OSIsInZlcnNpb24iOjF9.4jHtzkDGNLPHSC7RN9Hi5jeiLy9F3JwBpDKdCjkiDmZY_cgHHCTr5v6QTr7VISZNQdCNg27iO0d8ohSIxVVXBg
	- type: rouge
	value: 31.0688
	name: ROUGE-L
	verified: true
	verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNmQzZjhhMjJjZWFkYWViODlhMDQ4MTY0YWI4NzRjZWMyMGZlMDQ1MjA0Yzk0MTczOWU0ODMyYjQ3NGEwOTZhNiIsInZlcnNpb24iOjF9.nUPHLaP5n_7YYbtV6ms0-fOGtPvEx826Ivsv-MfKiUVKyxTJ-9G_xbECK2cS1XQxuO05tlWhO89zz03vsNkuAQ
	- type: rouge
	value: 38.3786
	name: ROUGE-LSUM
	verified: true
	verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTJlMGUwMTE4NWRkMWZkYzU0NzA3ODQzOWE2MWY3MWU2YmFjNzQ0ZTU2MDZiYTY0ZmY2N2U4NmUyOTY5NDRkMiIsInZlcnNpb24iOjF9.-T68JCuA99EVzu4fIOJN-Vyu-d__RYvfnKPaLu4pJ2cOmRVKh2Qc6pHnjXDP2powPu2R6pD6KcANZhEE4AEVAw
	- type: loss
	value: 2.184831380844116
	name: loss
	verified: true
	verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTJkZjk3Y2YwM2E4ZDQxYzZkOGNkYzY3N2I4NzkyZmU4ZjYyYjk1Y2FiZjRkN2I1MTEzZmI4Y2FjNjBiYWNjZiIsInZlcnNpb24iOjF9.7OgmxB2mQ7CYH9p9p56bf7cAjkA6YflzB75zd0-O1WYgrsEibX-Zb2H6-0SMqxD-drWrRrEpma1Tu1fWSkBhDQ
	- type: gen_len
	value: 42.2081
	name: gen_len
	verified: true
	verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMmEyZmYyOGVmN2JlYzM2MzI4YzU3ZDhjYmYwZDlhZTliNzk0ODVjNjU5MGNlMmRjOGZiOTk0MGU1YWM0NDcxMCIsInZlcnNpb24iOjF9.b9-F5AzXERN-pVcH61r23kaqdKO4iX79mQPRnoZ_riZ91o6UihsNftdGa50vgleloGDwkKT4aR6PNMZujCRZDQ
	---

	## `t5-large-samsum-deepspeed`
	This model was trained using Microsoft's `AzureML` and `DeepSpeed`'s ZeRO 2 optimization. It was fine-tuned on the `SAMSum` corpus from `t5-large` checkpoint.

	More information on the fine-tuning process (includes samples and benchmarks):
	(currently still WIP, major updates coming soon: 7/6/21~7/9/21)

	## Resource Usage
	These results are retrieved from AzureML Studio's resource monitoring module. All experiments were ran on AzureML's low priority clusters.

	\| key \| value \|
	\| --- \| ----- \|
	\| AzureML SKU \| ND40rs_v2 (8 X V100 32GB) \|
	\| Region \| US West 2 \|
	\| Run Duration \| 12m 47.13s \|
	\| Compute Cost (LowPriority/Dedicated) \| $0.94/$4.69 (USD) \|
	\| Average CPU Utilization \| 51.2% \|
	\| Average GPU Utilization \| 42.0% \|
	\| GPU Memory Usage (Avg/Peak) \| 24.85/28.79 (GB) \|
	\| Total GPU Energy Usage \| 670.38 (kJ) \|

	*Compute cost is calculated from run duration and SKU's price per hour. Updated SKU pricing could be found here: https://azure.microsoft.com/en-us/pricing/details/machine-learning/
	*Peak memory usage is calculated from average peak across all utilized GPUs.

	### Carbon Emissions
	These results are obtained using `codecarbon`. The carbon emission is estimated from training runtime only (excluding setup and evaluation runtime).
	CodeCarbon: https://github.com/mlco2/codecarbon

	\| key \| value \|
	\| --- \| ----- \|
	\| timestamp \| 2021-07-08T06:29:27 \|
	\| duration \| 515.5018835067749 \|
	\| emissions \| 0.043562840982919106 \|
	\| energy_consumed \| 0.14638051405550773 \|
	\| country_name \| USA \|
	\| region \| Washington \|
	\| cloud_provider \| azure \|
	\| cloud_region \| westus2 \|

	## Hyperparameters
	```yaml
	fp16: True
	per device batch size: 8
	effective batch size: 64
	epoch: 3.0
	learning rate: 1e-4
	weight decay: 0.1
	seed: 1
	```
	*Same `per device batch size` for evaluations

	### DeepSpeed
	Optimizer = `AdamW`, Scheduler = `WarmupDecayLR`, Offload = `none`
	```json
	"zero_optimization": {
	"stage": 2,
	"allgather_partitions": true,
	"allgather_bucket_size": 1300000000,
	"overlap_comm": true,
	"reduce_scatter": true,
	"reduce_bucket_size": 1300000000,
	"contiguous_gradients": true
	}
	```

	## Usage
	```python
	from transformers import pipeline
	summarizer = pipeline("summarization", model="henryu-lin/t5-large-samsum-deepspeed")

	conversation = '''Kevin: Hey man, are you excited to watch Finding Nemo tonight?
	Henry: Yea, I can't wait to watch that same movie for the 89th time. Is Nate coming over to watch it with us tonight?
	Kevin: Yep, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
	Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class. I didn't get to start on it until an hour ago, and it's due in 30 minutes.
	Kevin: Okay dude, you should take it out as soon as possible. By the way, Nate is bringing his girlfriend and their cat too.
	Henry: Nice, I'm really looking forward to seeing them again.
	'''
	summarizer(conversation)
	```

	## Results
	\| ROUGE \| Score \|
	\| ----- \| ----- \|
	\| eval_rouge1 \| 53.0823 \|
	\| eval_rouge2 \| 28.7097 \|
	\| eval_rougeL \| 43.939 \|
	\| eval_rougeLsum \| 49.067 \|
	\| predict_rouge1 \| 51.6716 \|
	\| predict_rouge2 \| 26.5372 \|
	\| predict_rougeL \| 42.9681 \|
	\| predict_rougeLsum \| 47.4084 \|

	\| Metric \| Value \|
	\| ------ \| ----- \|
	\| eval_gen_len \| 26.4071 \|
	\| predict_gen_len \| 25.9451 \|
	\| train_loss \| 1.3212629926497115 \|
	\| eval_loss \| 1.23828125 \|
	\| predict_loss \| 1.2333984375 \|
	\| train_runtime \| 515.2198 \|
	\| train_samples \| 14732 \|
	\| train_samples_per_second \| 85.781 \|
	\| train_steps_per_second \| 1.345 \|
	\| eval_runtime \| 61.275 \|
	\| eval_samples \| 818 \|
	\| eval_samples_per_second \| 13.35 \|
	\| eval_steps_per_second \| 0.212 \|
	\| predict_runtime \| 63.3732 \|
	\| predict_samples \| 819 \|
	\| predict_samples_per_second \| 12.923 \|
	\| predict_steps_per_second \| 0.205 \|
	\| total_steps \| 693 \|
	\| total_flos \| 7.20140924616704e+16 \|