PEGASUS Fine-Tuned for Dialogue Summarization
Introduction
This project showcases the fine-tuning of the PEGASUS model for summarizing dialogues. Leveraging the Hugging Face Transformers library, the workflow involves training the PEGASUS model on the SAMSum dataset, which consists of conversational dialogues and their summaries. The process includes data preparation, model training, evaluation using ROUGE metrics, and visualization of token lengths. The final model and tokenizer are saved and uploaded to the Hugging Face Model Hub for sharing and deployment. This setup provides an effective solution for natural language processing tasks involving dialogue summarization.
Fine-Tuning PEGASUS for Dialogue Summarization
This model is a fine-tuned version of the PEGASUS model, specifically adapted for summarizing dialogues. The fine-tuning was performed on the SAMSum dataset, which contains conversational dialogues and their corresponding summaries.
Model Details
- Base Model: google/pegasus-cnn_dailymail
- Fine-Tuned On: SAMSum dataset
- Model Type: Sequence-to-Sequence (Seq2Seq)
- Task: Dialogue Summarization
Performance
The model's performance was evaluated using the ROUGE metric, which assesses the quality of the generated summaries compared to reference summaries. The following ROUGE scores were achieved:
ROUGE Metric | Score |
---|---|
ROUGE-1 | 0.015558 |
ROUGE-2 | 0.000301 |
ROUGE-L | 0.015546 |
ROUGE-Lsum | 0.015532 |
Usage
To use this model for summarizing dialogues, you can utilize the following code:
from transformers import pipeline
# Load the fine-tuned PEGASUS model
summarizer = pipeline("summarization", model="mynkchaudhry/Summarization-Pro")
# Example dialogue
dialogue = "Your dialogue text here."
# Generate summary
summary = summarizer(dialogue)
print(summary[0]['summary_text'])
- Downloads last month
- 3