AI Detect Model
Model Description
GitHub repository for exploring the source code and additional resources: https://github.com/wangkevin02/USP
The AI Detect Model is a binary classification model designed to determine whether a given text is AI-generated (label=1) or written by a human (label=0). This model plays a crucial role in providing AI detection rewards, helping to prevent reward hacking during Reinforcement Learning with Cycle Consistency (RLCC). For more details, please refer to our paper.
This model is built upon the Longformer architecture and trained using our proprietary LMSYS-USP dataset. Specifically, in a dialogue context, texts generated by the assistant are labeled as AI-generated (label=1), while user-generated texts are assigned the opposite label (label=0).
Note: Our model is subject to the following constraints:
- Maximum Context Length: Supports up to 4,096 tokens. Exceeding this may degrade performance; keep inputs within this limit for best results.
- Language Limitation: Optimized for English. Non-English performance may vary due to limited training data.
Quick Start
You can utilize our AI detection model as demonstrated below:
from transformers import LongformerTokenizer, LongformerForSequenceClassification
import torch
import torch.nn.functional as F
class AIDetector:
def __init__(self, model_name="allenai/longformer-base-4096", max_length=4096):
"""
Initialize the AIDetector with a pretrained Longformer model and tokenizer.
Args:
model_name (str): The name or path of the pretrained Longformer model.
max_length (int): The maximum sequence length for tokenization.
"""
self.tokenizer = LongformerTokenizer.from_pretrained(model_name)
self.model = LongformerForSequenceClassification.from_pretrained(model_name)
self.model.eval()
self.max_length = max_length
self.tokenizer.padding_side = "right"
@torch.no_grad()
def get_probability(self, texts):
inputs = self.tokenizer(texts, padding=True, truncation=True, max_length=self.max_length, return_tensors='pt')
outputs = self.model(**inputs)
probabilities = F.softmax(outputs.logits, dim=1)
return probabilities
# Example usage
if __name__ == "__main__":
classifier = AIDetector(model_name="/path/to/ai_detector")
target_text = [
"I am thinking about going away for vacation",
"How can I help you today?"
]
result = classifier.get_probability(target_text)
print(result)
# >>> Expected Output:
# >>> tensor([[0.9954, 0.0046],
# >>> [0.0265, 0.9735]])
Citation
If you find this model useful, please cite:
@misc{wang2025knowbettermodelinghumanlike,
title={Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles},
author={Kuang Wang and Xianfei Li and Shenghao Yang and Li Zhou and Feng Jiang and Haizhou Li},
year={2025},
eprint={2502.18968},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.18968},
}
- Downloads last month
- 94
Model tree for wangkevin02/AI_Detect_Model
Base model
allenai/longformer-base-4096