Edit model card

Kaleemullah/bert-base-uncased-ad-nonad-classifier

Model Description

This model is a fine-tuned version of bert-base-uncased, specifically tailored for distinguishing between advertising (ad) and non-advertising (non-ad) text content. It is designed to understand the nuances and language patterns that differentiate promotional content from other types of text.

Intended Use

  • Primary Use Case: Text classification, specifically identifying whether a given piece of text is an advertisement or not.
  • Out-of-Scope Use Cases: This model is not intended for understanding context beyond the binary classification of ads vs. non-ads. It should not be used for complex natural language understanding tasks like sentiment analysis, question-answering, etc.

Training Data

The model was trained on a balanced dataset consisting of 40,000 examples, with 20,000 ads and 20,000 non-ads. Each text entry was preprocessed and tokenized using the BERT tokenizer.

Training Procedure

  • Preprocessing: Text entries were tokenized using BertTokenizer with a maximum length of 512 tokens.
  • Fine-Tuning: The model was fine-tuned on the preprocessed data for 3 epochs using the Hugging Face transformers Trainer API.
  • Evaluation Metrics: The model's performance was evaluated based on accuracy, precision, recall, and F1-score.

Performance

The model achieved the following metrics on the test dataset:

  • Accuracy: 99.71%
  • Precision: 99.76%
  • Recall: 99.67%
  • F1-score: 99.72%

Note: this model meant to be update soon (it is overfitting on one Non-Ad Catagory (will be updated soon))

How to Use

from transformers import BertTokenizer, BertForSequenceClassification
import torch

model_name = "Kaleemullah/bert-base-uncased-ad-nonad-classifier"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

def predict(text):
    inputs = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=-1).numpy()[0]
    return "Ad" if prediction == 1 else "Non-Ad"

# Example
predict("Your example text here")
Downloads last month
5
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.