roberta-large Image Prompt Classifier

Model Overview

This model is a fine-tuned version of roberta-large designed specifically for classifying image generation prompts into three distinct categories: SAFE, QUESTIONABLE, and UNSAFE. Leveraging the robust capabilities of the roberta-large architecture, this model ensures high accuracy and reliability in identifying the nature of prompts used for generating images.

Model Details

Model Name: roberta-large Image Prompt Classifier
Base Model: roberta-large
Fine-tuned By: Michał Młodawski
Categories:
- 0: SAFE
- 1: QUESTIONABLE
- 2: UNSAFE

Use Cases

This model is particularly useful for platforms and applications involving AI-generated content, where it is crucial to filter and classify prompts to maintain content safety and appropriateness. Some potential applications include:

Content Moderation: Automatically classify and filter prompts to prevent the generation of inappropriate or harmful images.
User Safety: Enhance user experience by ensuring that generated content adheres to safety guidelines.
Compliance: Help platforms comply with regulatory requirements by identifying and flagging potentially unsafe prompts.

How It Works

The model takes an input prompt and classifies it into one of three categories:

SAFE: Prompts that are deemed appropriate and free from harmful content.
QUESTIONABLE: Prompts that may require further review due to potential ambiguity or slight risk.
UNSAFE: Prompts that are likely to generate inappropriate or harmful content.

The classification is based on the semantic understanding and contextual analysis provided by the roberta-large architecture, fine-tuned on a curated dataset tailored for this specific task.

Performance

Extensive testing and validation have been conducted to ensure the model's performance meets high standards of accuracy and reliability. Key metrics include:

Accuracy: 93%
Precision: 88%
Recall: 90%

These metrics highlight the model's capability to effectively distinguish between different categories of prompts with high confidence.

Getting Started

To start using the roberta-large Image Prompt Classifier, you can easily integrate it into your existing workflows with the following steps:

Installation

pip install transformers==4.37.2
pip install torch==2.3.1

Usage

import os
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch


model_repo = "MichalMlodawski/nsfw-text-detection-large"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)

def classify_text(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    
    with torch.no_grad():
        outputs = model(**inputs)
    
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()
    
    return predicted_class

if __name__ == "__main__":
    examples = [
        {"text": "wild west cowboy caravan through Arizona Desert , Men Woman and children, horse drawn carriages", "expected_class": 0},
        {"text": "A person with questionable intentions in a dimly lit alley.", "expected_class": 1},
        {"text": "An inappropriate scene involving explicit content.", "expected_class": 2}
    ]
    
    for example in examples:
        text = example["text"]
        expected_class = example["expected_class"]
        predicted_class = classify_text(text)
        print(f"Prompt: {text}")
        print(f"Predicted class: {predicted_class}")
        print(f"Expected class: {expected_class}")
        print("---")

Disclaimer

The dataset includes actual prompts for image generation that could be perceived as abusive, offensive, or obscene. Furthermore, the examples and data might contain unfavorable information about certain businesses. We merely collect this data and do not assume any legal responsibility. Please note: A portion of the data was created using Large Language Models (LLM).

MichalMlodawski
/

nsfw-text-detection-large