Spam-Bert-Uncased / README.md
prithivMLmods's picture
Update README.md
4af0eb2 verified
|
raw
history blame
3.25 kB
metadata
license: creativeml-openrail-m
datasets:
  - prithivMLmods/Spam-Text-Detect-Analysis
language:
  - en
base_model:
  - google-bert/bert-base-uncased
pipeline_tag: text-classification
library_name: transformers

SPAM DETECTION UNCASED [ SPAM / HAM ]

This implementation leverages BERT (Bidirectional Encoder Representations from Transformers) for binary classification (Spam / Ham) using sequence classification. The model uses the prithivMLmods/Spam-Text-Detect-Analysis dataset and integrates Weights & Biases (wandb) for comprehensive experiment tracking.


πŸ› οΈ Overview

Core Details:

  • Model: BERT for sequence classification
    Pre-trained Model: bert-base-uncased
  • Task: Spam detection - Binary classification task (Spam vs Ham).
  • Metrics Tracked:
    • Accuracy
    • Precision
    • Recall
    • F1 Score
    • Evaluation loss

πŸ“Š Key Results

Results were obtained using BERT and the provided training dataset:

  • Validation Accuracy: 0.9937
  • Precision: 0.9931
  • Recall: 0.9597
  • F1 Score: 0.9761

πŸ“ˆ Model Training Details

Model Architecture:

The model uses bert-base-uncased as the pre-trained backbone and is fine-tuned for the sequence classification task.

Training Parameters:

  • Learning Rate: 2e-5
  • Batch Size: 16
  • Epochs: 3
  • Loss: Cross-Entropy

πŸš€ How to Train the Model

  1. Clone Repository:

    git clone <repository-url>
    cd <project-directory>
    
  2. Install Dependencies: Install all necessary dependencies.

    pip install -r requirements.txt
    

    or manually:

    pip install transformers datasets wandb scikit-learn
    
  3. Train the Model: Assuming you have a script like train.py, run:

    from train import main
    

✨ Weights & Biases Integration

Why Use wandb?

  • Monitor experiments in real time via visualization.
  • Log metrics such as loss, accuracy, precision, recall, and F1 score.
  • Provides a history of past runs and their comparisons.

Initialize Weights & Biases

Include this snippet in your training script:

import wandb
wandb.init(project="spam-detection")

πŸ“ Directory Structure

The directory is organized to ensure scalability and clear separation of components:

project-directory/
β”‚
β”œβ”€β”€ data/                # Dataset processing scripts
β”œβ”€β”€ wandb/              # Logged artifacts from wandb runs
β”œβ”€β”€ results/            # Save training and evaluation results
β”œβ”€β”€ model/              # Trained model checkpoints
β”œβ”€β”€ requirements.txt    # List of dependencies
└── train.py            # Main script for training the model

πŸ”— Dataset Information

The training dataset comes from Spam-Text-Detect-Analysis available on Hugging Face:

Dataset size:

  • 5.57k entries

Let me know if you need assistance setting up the training pipeline, optimizing metrics, visualizing with wandb, or deploying this fine-tuned model. πŸš€