Toxic Comment Classification Using RoBERTa

Overview

This project provides a toxic comment classification model based on RoBERTa (Robustly optimized BERT approach). The model is designed to classify comments as toxic or non-toxic, helping in moderating online discussions and improving community interactions.

Model Details

Model Name: RoBERTa for Toxic Comment Classification
Architecture: RoBERTa
Fine-tuning Task: Binary classification (toxic vs. non-toxic)
Evaluation Metrics:
- Accuracy
- F1 Score
- Precision
- Recall

Files

pytorch_model.bin: The trained model weights.
config.json: Model configuration file.
merges.txt: BPE tokenizer merges file.
model.safetensors: Model weights in safetensors format.
special_tokens_map.json: Tokenizer special tokens mapping.
tokenizer_config.json: Tokenizer configuration file.
vocab.json: Tokenizer vocabulary file.
roberta-toxic-comment-classifier.pkl: Serialized best model state dictionary (for PyTorch).
README.md: This documentation file.

Model Performance

Accuracy: 0.9599
F1 Score: 0.9615
Precision: 0.9646
Recall: 0.9599

Load the model

from transformers import pipeline

# Load the model and tokenizer
model_name = "prabhaskenche/pk-toxic-comment-classification-using-RoBERTa"
classifier = pipeline("text-classification", model=model_name)

# Example usage
text = "You're the worst person I've ever met."
result = classifier(text)
print(result)

Usage

Installation

Install the required packages:

pip install torch transformers sklearn

prabhaskenche
/

toxic-comment-classification-using-RoBERTa