AI_Detect_Model / README.md
wangkevin02's picture
Upload README.md with huggingface_hub
48c56aa verified
---
license: mit
datasets:
- wangkevin02/LMSYS-USP
language:
- en
metrics:
- accuracy
base_model:
- allenai/longformer-base-4096
---
# AI Detect Model
## Model Description
> **GitHub repository** for exploring the source code and additional resources: https://github.com/wangkevin02/USP
The **AI Detect Model** is a binary classification model designed to determine whether a given text is AI-generated (label=1) or written by a human (label=0). This model plays a crucial role in providing AI detection rewards, helping to prevent reward hacking during Reinforcement Learning with Cycle Consistency (RLCC). For more details, please refer to [our paper](https://arxiv.org/pdf/2502.18968).
This model is built upon the [Longformer](https://huggingface.co/allenai/longformer-base-4096) architecture and trained using our proprietary [LMSYS-USP](https://huggingface.co/datasets/wangkevin02/LMSYS-USP) dataset. Specifically, in a dialogue context, texts generated by the assistant are labeled as AI-generated (label=1), while user-generated texts are assigned the opposite label (label=0).
> *Note*: Our model is subject to the following constraints:
>
> 1. **Maximum Context Length**: Supports up to **4,096 tokens**. Exceeding this may degrade performance; keep inputs within this limit for best results.
> 2. **Language Limitation**: Optimized for English. Non-English performance may vary due to limited training data.
## Quick Start
You can utilize our AI detection model as demonstrated below:
```python
from transformers import LongformerTokenizer, LongformerForSequenceClassification
import torch
import torch.nn.functional as F
class AIDetector:
def __init__(self, model_name="allenai/longformer-base-4096", max_length=4096):
"""
Initialize the AIDetector with a pretrained Longformer model and tokenizer.
Args:
model_name (str): The name or path of the pretrained Longformer model.
max_length (int): The maximum sequence length for tokenization.
"""
self.tokenizer = LongformerTokenizer.from_pretrained(model_name)
self.model = LongformerForSequenceClassification.from_pretrained(model_name)
self.model.eval()
self.max_length = max_length
self.tokenizer.padding_side = "right"
@torch.no_grad()
def get_probability(self, texts):
inputs = self.tokenizer(texts, padding=True, truncation=True, max_length=self.max_length, return_tensors='pt')
outputs = self.model(**inputs)
probabilities = F.softmax(outputs.logits, dim=1)
return probabilities
# Example usage
if __name__ == "__main__":
classifier = AIDetector(model_name="/path/to/ai_detector")
target_text = [
"I am thinking about going away for vacation",
"How can I help you today?"
]
result = classifier.get_probability(target_text)
print(result)
# >>> Expected Output:
# >>> tensor([[0.9954, 0.0046],
# >>> [0.0265, 0.9735]])
```
## Citation
If you find this model useful, please cite:
```plaintext
@misc{wang2025knowbettermodelinghumanlike,
title={Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles},
author={Kuang Wang and Xianfei Li and Shenghao Yang and Li Zhou and Feng Jiang and Haizhou Li},
year={2025},
eprint={2502.18968},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.18968},
}
```