Spaces:

akshatsanghvi
/

spam-email-detection

Sleeping

App Files Files Community

akshatsanghvi commited on Jun 3

Commit

12fb5e6

•

2 Parent(s): ddaa8f4 d75c366

Merge branch 'main' of https://github.com/iiakshat/spam-mail-detection

Browse files

Files changed (1) hide show

README.md +69 -10

README.md CHANGED Viewed

@@ -6,14 +6,73 @@ colorTo: blue
 sdk: gradio
 sdk_version: 3.17.0
 app_file: app.py
-pinned: false
-license: artistic-2.0
 ---
-# spam-mail-detection
-A simple text classifier in Python that uses the Naive Bayes model to classify e-mails as spam or ham,
-in other words, it used naive-bayes method to detect if a email or message is spam or not.
-### What is a Spam message ?
-Spam is any kind of unwanted, unsolicited digital communication that gets sent out in bulk.
-Often spam is sent via email, but it can also be distributed via text messages, phone calls, or social media.
-dataset downloaded from kaggle. 👉 https://www.kaggle.com/datasets/mfaisalqureshi/spam-email?resource=download

 sdk: gradio
 sdk_version: 3.17.0
 app_file: app.py
 ---
+# Email Spam and Phishing URL Detection
+This project utilizes Naive Bayes classification to detect whether an email is spam or not, and XGBoost classification to determine if a URL within an email is phishing or legitimate.
+# Getting Started
+## Project Overview
+The project consists of two main components:
+1. **Email Spam Detection**: This component employs Naive Bayes classification to classify emails as either spam or not spam based on their content features.
+2. **Phishing URL Detection**: This component uses XGBoost classification to identify whether URLs within emails are associated with phishing attempts or legitimate websites.
+## Prerequisites
+Make sure you have Python 3.10 installed on your system. You can download it from [](python.org)
+## Requirements
+Ensure you have the following dependencies installed. You can install them using `pip install -r requirements.txt`.
+- gunicorn==22.0.0
+- python-dateutil==2.8.2
+- gradio==4.32.1
+- gradio_client==0.17.0
+- requests==2.31.0
+- beautifulsoup4==4.12.3
+- googlesearch_python==1.2.4
+- urlextract==1.9.0
+- numpy==1.26.3
+- pandas==2.2.0
+- scikit-learn==1.5.0
+- urllib3==2.1.0
+- python-whois==0.9.4
+- xgboost==2.0.3
+- lxml==5.2.2
+## Setup and Installation
+1. Clone the repository:
+   ```bash
+   git clone https://github.com/your-username/email-spam-phishing-detection.git
+   cd email-spam-phishing-detection
+2. Install dependencies:
+   ```bash
+   pip install -r requirements.txt```
+## Usage
+1. **Data Preparation:**
+   - Ensure the datasets `spam.csv` and `urldata.csv` are available in the `data/` directory.
+2. **Model Training:**
+   - If necessary, modify and run the `notebook.ipynb` Jupyter notebook to train or fine-tune the machine learning models.
+   - Trained models will be saved in the `models/` directory.
+3. **Run the Application:**
+   - Execute `app.py` to start the application.
+   - Access the application at [Hugging Face Space](https://huggingface.co/spaces/akshatsanghvi/spam-email-detection)
+## Acknowledgements
+- The email spam classification model is trained using the `spam.csv` dataset, sourced from [Dataset: Spam/ham mail](https://www.kaggle.com/datasets/mfaisalqureshi/spam-email?resource=download)).
+- The URL phishing detection model is trained using the `urldata.csv` dataset, sourced from [Phishing Websites Dataset](https://www.kaggle.com/datasets).
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.