Toxic Comment Classification Using RoBERTa

Overview

This project provides a toxic comment classification model based on RoBERTa (Robustly optimized BERT approach). The model is designed to classify comments as toxic or non-toxic, helping in moderating online discussions and improving community interactions.

Model Details

  • Model Name: RoBERTa for Toxic Comment Classification
  • Architecture: RoBERTa
  • Fine-tuning Task: Binary classification (toxic vs. non-toxic)
  • Evaluation Metrics:
    • Accuracy
    • F1 Score
    • Precision
    • Recall

Files

  • pytorch_model.bin: The trained model weights.
  • config.json: Model configuration file.
  • merges.txt: BPE tokenizer merges file.
  • model.safetensors: Model weights in safetensors format.
  • special_tokens_map.json: Tokenizer special tokens mapping.
  • tokenizer_config.json: Tokenizer configuration file.
  • vocab.json: Tokenizer vocabulary file.
  • roberta-toxic-comment-classifier.pkl: Serialized best model state dictionary (for PyTorch).
  • README.md: This documentation file.

Model Performance

  • Accuracy: 0.9599
  • F1 Score: 0.9615
  • Precision: 0.9646
  • Recall: 0.9599

Load the model

from transformers import pipeline

# Load the model and tokenizer
model_name = "prabhaskenche/pk-toxic-comment-classification-using-RoBERTa"
classifier = pipeline("text-classification", model=model_name)

# Example usage
text = "You're the worst person I've ever met."
result = classifier(text)
print(result)

Usage

Installation

Install the required packages:

pip install torch transformers sklearn
Downloads last month
10
Safetensors
Model size
355M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using prabhaskenche/toxic-comment-classification-using-RoBERTa 1