---
license: mit
datasets:
- wangkevin02/LMSYS-USP
language:
- en
metrics:
- accuracy
base_model:
- allenai/longformer-base-4096
---
# AI Detect Model

## Model Description

> **GitHub repository** for exploring the source code and additional resources: https://github.com/wangkevin02/USP

The **AI Detect Model** is a binary classification model designed to determine whether a given text is AI-generated (label=1) or written by a human (label=0). This model plays a crucial role in providing AI detection rewards, helping to prevent reward hacking during Reinforcement Learning with Cycle Consistency (RLCC). For more details, please refer to [our paper](https://arxiv.org/pdf/2502.18968).

This model is built upon the [Longformer](https://huggingface.co/allenai/longformer-base-4096) architecture and trained using our proprietary [LMSYS-USP](https://huggingface.co/datasets/wangkevin02/LMSYS-USP) dataset. Specifically, in a dialogue context, texts generated by the assistant are labeled as AI-generated (label=1), while user-generated texts are assigned the opposite label (label=0).

> *Note*: Our model is subject to the following constraints:
>
> 1. **Maximum Context Length**: Supports up to **4,096 tokens**. Exceeding this may degrade performance; keep inputs within this limit for best results.
> 2. **Language Limitation**: Optimized for English. Non-English performance may vary due to limited training data.


## Quick Start

You can utilize our AI detection model as demonstrated below:

```python
from transformers import LongformerTokenizer, LongformerForSequenceClassification
import torch
import torch.nn.functional as F

class AIDetector:
    def __init__(self, model_name="allenai/longformer-base-4096", max_length=4096):
        """
        Initialize the AIDetector with a pretrained Longformer model and tokenizer.

        Args:
            model_name (str): The name or path of the pretrained Longformer model.
            max_length (int): The maximum sequence length for tokenization.
        """
        self.tokenizer = LongformerTokenizer.from_pretrained(model_name)
        self.model = LongformerForSequenceClassification.from_pretrained(model_name)
        self.model.eval()
        self.max_length = max_length
        self.tokenizer.padding_side = "right"

    @torch.no_grad()
    def get_probability(self, texts):
        inputs = self.tokenizer(texts, padding=True, truncation=True, max_length=self.max_length, return_tensors='pt')
        outputs = self.model(**inputs)
        probabilities = F.softmax(outputs.logits, dim=1)
        return probabilities

# Example usage
if __name__ == "__main__":
    classifier = AIDetector(model_name="/path/to/ai_detector")
    target_text = [
        "I am thinking about going away for vacation",
        "How can I help you today?"
        ]
    result = classifier.get_probability(target_text)
    print(result)
    # >>> Expected Output:
    # >>> tensor([[0.9954, 0.0046],
    # >>>         [0.0265, 0.9735]])    
```


## Citation

If you find this model useful, please cite:

```plaintext
@misc{wang2025knowbettermodelinghumanlike,
      title={Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles}, 
      author={Kuang Wang and Xianfei Li and Shenghao Yang and Li Zhou and Feng Jiang and Haizhou Li},
      year={2025},
      eprint={2502.18968},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.18968}, 
}
```