Kaleemullah/bert-base-uncased-ad-nonad-classifier
Model Description
This model is a fine-tuned version of bert-base-uncased
, specifically tailored for distinguishing between advertising (ad) and non-advertising (non-ad) text content. It is designed to understand the nuances and language patterns that differentiate promotional content from other types of text.
Intended Use
- Primary Use Case: Text classification, specifically identifying whether a given piece of text is an advertisement or not.
- Out-of-Scope Use Cases: This model is not intended for understanding context beyond the binary classification of ads vs. non-ads. It should not be used for complex natural language understanding tasks like sentiment analysis, question-answering, etc.
Training Data
The model was trained on a balanced dataset consisting of 40,000 examples, with 20,000 ads and 20,000 non-ads. Each text entry was preprocessed and tokenized using the BERT tokenizer.
Training Procedure
- Preprocessing: Text entries were tokenized using
BertTokenizer
with a maximum length of 512 tokens. - Fine-Tuning: The model was fine-tuned on the preprocessed data for 3 epochs using the Hugging Face
transformers
Trainer API. - Evaluation Metrics: The model's performance was evaluated based on accuracy, precision, recall, and F1-score.
Performance
The model achieved the following metrics on the test dataset:
- Accuracy: 99.71%
- Precision: 99.76%
- Recall: 99.67%
- F1-score: 99.72%
Note: this model meant to be update soon (it is overfitting on one Non-Ad Catagory (will be updated soon))
How to Use
from transformers import BertTokenizer, BertForSequenceClassification
import torch
model_name = "Kaleemullah/bert-base-uncased-ad-nonad-classifier"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
def predict(text):
inputs = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).numpy()[0]
return "Ad" if prediction == 1 else "Non-Ad"
# Example
predict("Your example text here")
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.