Toxic Comment Classification Using RoBERTa
Overview
This project provides a toxic comment classification model based on RoBERTa (Robustly optimized BERT approach). The model is designed to classify comments as toxic or non-toxic, helping in moderating online discussions and improving community interactions.
Model Details
- Model Name: RoBERTa for Toxic Comment Classification
- Architecture: RoBERTa
- Fine-tuning Task: Binary classification (toxic vs. non-toxic)
- Evaluation Metrics:
- Accuracy
- F1 Score
- Precision
- Recall
Files
pytorch_model.bin
: The trained model weights.config.json
: Model configuration file.merges.txt
: BPE tokenizer merges file.model.safetensors
: Model weights in safetensors format.special_tokens_map.json
: Tokenizer special tokens mapping.tokenizer_config.json
: Tokenizer configuration file.vocab.json
: Tokenizer vocabulary file.roberta-toxic-comment-classifier.pkl
: Serialized best model state dictionary (for PyTorch).README.md
: This documentation file.
Model Performance
- Accuracy: 0.9599
- F1 Score: 0.9615
- Precision: 0.9646
- Recall: 0.9599
Load the model
from transformers import pipeline
# Load the model and tokenizer
model_name = "prabhaskenche/pk-toxic-comment-classification-using-RoBERTa"
classifier = pipeline("text-classification", model=model_name)
# Example usage
text = "You're the worst person I've ever met."
result = classifier(text)
print(result)
Usage
Installation
Install the required packages:
pip install torch transformers sklearn
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.