AI Detect Model

Model Description

GitHub repository for exploring the source code and additional resources: https://github.com/wangkevin02/USP

The AI Detect Model is a binary classification model designed to determine whether a given text is AI-generated (label=1) or written by a human (label=0). This model plays a crucial role in providing AI detection rewards, helping to prevent reward hacking during Reinforcement Learning with Cycle Consistency (RLCC). For more details, please refer to our paper.

This model is built upon the Longformer architecture and trained using our proprietary LMSYS-USP dataset. Specifically, in a dialogue context, texts generated by the assistant are labeled as AI-generated (label=1), while user-generated texts are assigned the opposite label (label=0).

Note: Our model is subject to the following constraints:

  1. Maximum Context Length: Supports up to 4,096 tokens. Exceeding this may degrade performance; keep inputs within this limit for best results.
  2. Language Limitation: Optimized for English. Non-English performance may vary due to limited training data.

Quick Start

You can utilize our AI detection model as demonstrated below:

from transformers import LongformerTokenizer, LongformerForSequenceClassification
import torch
import torch.nn.functional as F

class AIDetector:
    def __init__(self, model_name="allenai/longformer-base-4096", max_length=4096):
        """
        Initialize the AIDetector with a pretrained Longformer model and tokenizer.

        Args:
            model_name (str): The name or path of the pretrained Longformer model.
            max_length (int): The maximum sequence length for tokenization.
        """
        self.tokenizer = LongformerTokenizer.from_pretrained(model_name)
        self.model = LongformerForSequenceClassification.from_pretrained(model_name)
        self.model.eval()
        self.max_length = max_length
        self.tokenizer.padding_side = "right"

    @torch.no_grad()
    def get_probability(self, texts):
        inputs = self.tokenizer(texts, padding=True, truncation=True, max_length=self.max_length, return_tensors='pt')
        outputs = self.model(**inputs)
        probabilities = F.softmax(outputs.logits, dim=1)
        return probabilities

# Example usage
if __name__ == "__main__":
    classifier = AIDetector(model_name="/path/to/ai_detector")
    target_text = [
        "I am thinking about going away for vacation",
        "How can I help you today?"
        ]
    result = classifier.get_probability(target_text)
    print(result)
    # >>> Expected Output:
    # >>> tensor([[0.9954, 0.0046],
    # >>>         [0.0265, 0.9735]])    

Citation

If you find this model useful, please cite:

@misc{wang2025knowbettermodelinghumanlike,
      title={Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles}, 
      author={Kuang Wang and Xianfei Li and Shenghao Yang and Li Zhou and Feng Jiang and Haizhou Li},
      year={2025},
      eprint={2502.18968},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.18968}, 
}
Downloads last month
94
Safetensors
Model size
149M params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for wangkevin02/AI_Detect_Model

Finetuned
(100)
this model

Dataset used to train wangkevin02/AI_Detect_Model

Collection including wangkevin02/AI_Detect_Model