|
from joblib import load |
|
from sklearn.feature_extraction.text import TfidfVectorizer |
|
import numpy as np |
|
import streamlit as st |
|
|
|
info = [ |
|
{"title": "NAME", "detail": "AKINBITAN TAIWO EMMANUEL"}, |
|
{"title": "MATRIC NO", "detail": "HNDCOM/22/032"}, |
|
{"title": "CLASS", "detail": "HND2"}, |
|
{"title": "LEVEL", "detail": "400L"}, |
|
{"title": "PROJECT SUPERVISOR", "detail": ""}, |
|
] |
|
st.title("Project Information") |
|
|
|
for item in info: |
|
st.write(f"{item['title']}: {item['detail']}") |
|
|
|
st.image('fcahpt.jpg', caption='federal college of animal health and production technology') |
|
st.header('Spam Detection using Naive Bayes Classifier') |
|
st.write('This is spam detection developed with python using Naive Bayes Classifier') |
|
vectorizer = load('tfidf_vectorizer.joblib') |
|
user_input = st.text_area("Enter some text:", "") |
|
if user_input is not None: |
|
x = vectorizer.transform([user_input]) |
|
model = load('Naive_Bayes_Spam_Detection.joblib') |
|
pred = model.predict(x) |
|
if pred[0] == 1: |
|
st.markdown("<b>Prediction: <span style='color:red'>The entered text is likey to be a Spam, be careful </span></b>", unsafe_allow_html=True) |
|
elif pred[0] == 0: |
|
st.markdown("<b>Prediction: <span style='color:green'>The entered text is not a Spam and safe</span></b>", unsafe_allow_html=True) |
|
else: |
|
st.write('Error, Try again') |
|
|
|
st.header("Project Description") |
|
st.markdown(""" |
|
Spam Detection using Naive Bayes Classifier is a classic and effective approach for automatically identifying spam emails or messages. |
|
In a comprehensive approach of how it works; |
|
""") |
|
|
|
st.header("1. Data Collection and Preprocessing:") |
|
st.markdown(""" |
|
- The process begins with collecting a dataset of emails or messages labeled as spam or non-spam (ham). |
|
- Each message undergoes preprocessing steps such as removing HTML tags, punctuation, and stopwords (commonly occurring words like "and", "the", etc.). |
|
- The text is then tokenized and transformed into numerical representations using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or Count Vectorization. |
|
""") |
|
|
|
st.header("2. Understanding Naive Bayes Classifier:") |
|
st.markdown(""" |
|
- Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem, which calculates the probability of a certain event happening given the occurrence of another event. |
|
- The "naive" assumption in Naive Bayes is that the features are conditionally independent given the class label. This simplifies the calculation and makes the algorithm computationally efficient. |
|
""") |
|
|
|
st.header("3. Training the Naive Bayes Model:") |
|
st.markdown(""" |
|
- The dataset is split into training and testing sets. |
|
- During training, the Naive Bayes classifier learns the probability distribution of words or features given each class (spam or ham). |
|
- It calculates the prior probabilities of spam and ham messages and the likelihood probabilities of each word occurring in spam and ham messages. |
|
- These probabilities are estimated from the training data using maximum likelihood estimation or other smoothing techniques. |
|
""") |
|
|
|
st.header("4. Classification:") |
|
st.markdown(""" |
|
- Once the model is trained, it can classify new, unseen messages. |
|
- Given a new message, the classifier calculates the probability that it belongs to each class (spam or ham) using Bayes' theorem. |
|
- The final classification decision is based on the class with the highest probability. If the probability of a message being spam is higher than a predefined threshold, it's classified as spam; otherwise, it's classified as ham. |
|
""") |
|
|
|
st.header("5. Model Evaluation:") |
|
st.markdown(""" |
|
- The performance of the Naive Bayes classifier is evaluated using metrics such as accuracy, precision, recall, and F1-score on a separate test dataset. |
|
- These metrics help assess how well the model generalizes to unseen data and its effectiveness in distinguishing between spam and non-spam messages. |
|
""") |
|
|
|
st.header("6. Deployment and Fine-Tuning:") |
|
st.markdown(""" |
|
- Once the model is trained and evaluated, it can be deployed for real-world use. |
|
- Deployment may involve integrating the model into email systems or messaging platforms to automatically filter spam messages. |
|
- Periodic updates and fine-tuning of the model may be necessary to adapt to changing spamming techniques and patterns. |
|
""") |