--- license: mit datasets: - wangkevin02/LMSYS-USP language: - en metrics: - accuracy base_model: - allenai/longformer-base-4096 --- # AI Detect Model ## Model Description > **GitHub repository** for exploring the source code and additional resources: https://github.com/wangkevin02/USP The **AI Detect Model** is a binary classification model designed to determine whether a given text is AI-generated (label=1) or written by a human (label=0). This model plays a crucial role in providing AI detection rewards, helping to prevent reward hacking during Reinforcement Learning with Cycle Consistency (RLCC). For more details, please refer to [our paper](https://arxiv.org/pdf/2502.18968). This model is built upon the [Longformer](https://huggingface.co/allenai/longformer-base-4096) architecture and trained using our proprietary [LMSYS-USP](https://huggingface.co/datasets/wangkevin02/LMSYS-USP) dataset. Specifically, in a dialogue context, texts generated by the assistant are labeled as AI-generated (label=1), while user-generated texts are assigned the opposite label (label=0). > *Note*: Our model is subject to the following constraints: > > 1. **Maximum Context Length**: Supports up to **4,096 tokens**. Exceeding this may degrade performance; keep inputs within this limit for best results. > 2. **Language Limitation**: Optimized for English. Non-English performance may vary due to limited training data. ## Quick Start You can utilize our AI detection model as demonstrated below: ```python from transformers import LongformerTokenizer, LongformerForSequenceClassification import torch import torch.nn.functional as F class AIDetector: def __init__(self, model_name="allenai/longformer-base-4096", max_length=4096): """ Initialize the AIDetector with a pretrained Longformer model and tokenizer. Args: model_name (str): The name or path of the pretrained Longformer model. max_length (int): The maximum sequence length for tokenization. """ self.tokenizer = LongformerTokenizer.from_pretrained(model_name) self.model = LongformerForSequenceClassification.from_pretrained(model_name) self.model.eval() self.max_length = max_length self.tokenizer.padding_side = "right" @torch.no_grad() def get_probability(self, texts): inputs = self.tokenizer(texts, padding=True, truncation=True, max_length=self.max_length, return_tensors='pt') outputs = self.model(**inputs) probabilities = F.softmax(outputs.logits, dim=1) return probabilities # Example usage if __name__ == "__main__": classifier = AIDetector(model_name="/path/to/ai_detector") target_text = [ "I am thinking about going away for vacation", "How can I help you today?" ] result = classifier.get_probability(target_text) print(result) # >>> Expected Output: # >>> tensor([[0.9954, 0.0046], # >>> [0.0265, 0.9735]]) ``` ## Citation If you find this model useful, please cite: ```plaintext @misc{wang2025knowbettermodelinghumanlike, title={Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles}, author={Kuang Wang and Xianfei Li and Shenghao Yang and Li Zhou and Feng Jiang and Haizhou Li}, year={2025}, eprint={2502.18968}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.18968}, } ```