Spaces:
Sleeping
Sleeping
akshatsanghvi
commited on
Merge branch 'main' of https://github.com/iiakshat/spam-mail-detection
Browse files
README.md
CHANGED
@@ -6,14 +6,73 @@ colorTo: blue
|
|
6 |
sdk: gradio
|
7 |
sdk_version: 3.17.0
|
8 |
app_file: app.py
|
9 |
-
pinned: false
|
10 |
-
license: artistic-2.0
|
11 |
---
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
sdk: gradio
|
7 |
sdk_version: 3.17.0
|
8 |
app_file: app.py
|
|
|
|
|
9 |
---
|
10 |
+
|
11 |
+
# Email Spam and Phishing URL Detection
|
12 |
+
|
13 |
+
This project utilizes Naive Bayes classification to detect whether an email is spam or not, and XGBoost classification to determine if a URL within an email is phishing or legitimate.
|
14 |
+
|
15 |
+
# Getting Started
|
16 |
+
## Project Overview
|
17 |
+
|
18 |
+
The project consists of two main components:
|
19 |
+
|
20 |
+
1. **Email Spam Detection**: This component employs Naive Bayes classification to classify emails as either spam or not spam based on their content features.
|
21 |
+
|
22 |
+
2. **Phishing URL Detection**: This component uses XGBoost classification to identify whether URLs within emails are associated with phishing attempts or legitimate websites.
|
23 |
+
|
24 |
+
## Prerequisites
|
25 |
+
Make sure you have Python 3.10 installed on your system. You can download it from [](python.org)
|
26 |
+
|
27 |
+
## Requirements
|
28 |
+
Ensure you have the following dependencies installed. You can install them using `pip install -r requirements.txt`.
|
29 |
+
|
30 |
+
- gunicorn==22.0.0
|
31 |
+
- python-dateutil==2.8.2
|
32 |
+
- gradio==4.32.1
|
33 |
+
- gradio_client==0.17.0
|
34 |
+
- requests==2.31.0
|
35 |
+
- beautifulsoup4==4.12.3
|
36 |
+
- googlesearch_python==1.2.4
|
37 |
+
- urlextract==1.9.0
|
38 |
+
- numpy==1.26.3
|
39 |
+
- pandas==2.2.0
|
40 |
+
- scikit-learn==1.5.0
|
41 |
+
- urllib3==2.1.0
|
42 |
+
- python-whois==0.9.4
|
43 |
+
- xgboost==2.0.3
|
44 |
+
- lxml==5.2.2
|
45 |
+
|
46 |
+
## Setup and Installation
|
47 |
+
|
48 |
+
1. Clone the repository:
|
49 |
+
|
50 |
+
```bash
|
51 |
+
git clone https://github.com/your-username/email-spam-phishing-detection.git
|
52 |
+
cd email-spam-phishing-detection
|
53 |
+
|
54 |
+
2. Install dependencies:
|
55 |
+
```bash
|
56 |
+
pip install -r requirements.txt```
|
57 |
+
|
58 |
+
## Usage
|
59 |
+
1. **Data Preparation:**
|
60 |
+
- Ensure the datasets `spam.csv` and `urldata.csv` are available in the `data/` directory.
|
61 |
+
|
62 |
+
2. **Model Training:**
|
63 |
+
- If necessary, modify and run the `notebook.ipynb` Jupyter notebook to train or fine-tune the machine learning models.
|
64 |
+
- Trained models will be saved in the `models/` directory.
|
65 |
+
|
66 |
+
3. **Run the Application:**
|
67 |
+
- Execute `app.py` to start the application.
|
68 |
+
- Access the application at [Hugging Face Space](https://huggingface.co/spaces/akshatsanghvi/spam-email-detection)
|
69 |
+
|
70 |
+
## Acknowledgements
|
71 |
+
|
72 |
+
- The email spam classification model is trained using the `spam.csv` dataset, sourced from [Dataset: Spam/ham mail](https://www.kaggle.com/datasets/mfaisalqureshi/spam-email?resource=download)).
|
73 |
+
- The URL phishing detection model is trained using the `urldata.csv` dataset, sourced from [Phishing Websites Dataset](https://www.kaggle.com/datasets).
|
74 |
+
|
75 |
+
## License
|
76 |
+
|
77 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
78 |
+
|