SA-Yur-or commited on
Commit
507dc96
·
1 Parent(s): e6499c9

[up]: Update README

Browse files
Files changed (1) hide show
  1. README.md +39 -18
README.md CHANGED
@@ -23,7 +23,7 @@ tags:
23
 
24
  <h1 align="center">SuperAnnotate</h1>
25
  <h3 align="center">
26
- LLM Content Detector V2<br/>
27
  Fine-Tuned RoBERTa Large<br/>
28
  </h3>
29
 
@@ -48,18 +48,18 @@ Couple of articles about this problem: [*Problems with Synthetic Data*](https://
48
 
49
  ### Training Data
50
 
51
- The training data with 'human' label was sourced from three open datasets with equal proportions:
52
 
53
- 1. [**Wikipedia**](https://huggingface.co/datasets/wikimedia/wikipedia)
54
- 1. [**Reddit ELI5 QA**](https://huggingface.co/datasets/rexarski/eli5_category)
55
- 1. [**Scientific Papers**](https://www.tensorflow.org/datasets/catalog/scientific_papers) extended version with full text of sections
 
 
 
56
 
57
- The second half of the dataset was obtained by generating answers to the corresponding human texts.
58
- For generation, 14 models from 4 different families were used, namely: GPT, LLaMA, Anthropic and Mistral
59
 
60
- As a result, the training dataset contained approximately ***36k*** pairs of text-label with an approximate balance of classes. \
61
- It's worth noting that the dataset's texts follow a logical structure: \
62
- Human-written and model-generated texts refer to a single prompt/instruction, though the prompts themselves were not used during training.
63
 
64
  > [!NOTE]
65
  > Furthermore, key n-grams (n ranging from 2 to 5) that exhibited the highest correlation with target labels were identified and subsequently removed from the training data utilizing the chi-squared test.
@@ -67,9 +67,7 @@ Human-written and model-generated texts refer to a single prompt/instruction, th
67
  ### Peculiarity
68
 
69
  During training, one of the priorities was not only maximizing the quality of predictions but also avoiding overfitting and obtaining an adequately confident predictor. \
70
- We are pleased to achieve the following state of model calibration:
71
-
72
- **TODO** Change graph or this section in general.
73
 
74
  ## Usage
75
 
@@ -83,8 +81,8 @@ from transformers import AutoTokenizer
83
  import torch.nn.functional as F
84
 
85
 
86
- model = RobertaClassifier.from_pretrained("SuperAnnotate/roberta-large-llm-content-detector-V2")
87
- tokenizer = AutoTokenizer.from_pretrained("SuperAnnotate/roberta-large-llm-content-detector-V2")
88
 
89
  text_example = "It's not uncommon for people to develop allergies or intolerances to certain foods as they get older. It's possible that you have always had a sensitivity to lactose (the sugar found in milk and other dairy products), but it only recently became a problem for you. This can happen because our bodies can change over time and become more or less able to tolerate certain things. It's also possible that you have developed an allergy or intolerance to something else that is causing your symptoms, such as a food additive or preservative. In any case, it's important to talk to a doctor if you are experiencing new allergy or intolerance symptoms, so they can help determine the cause and recommend treatment."
90
 
@@ -114,13 +112,36 @@ A custom architecture was chosen for its ability to perform binary classificatio
114
  - **Base Model**: [FacebookAI/roberta-large](https://huggingface.co/FacebookAI/roberta-large)
115
  - **Epochs**: 20
116
  - **Learning Rate**: 5e-05
117
- - **Weight Decay**: 0.001
118
- - **Label Smoothing**: 0.27
119
  - **Warmup Epochs**: 2
120
  - **Optimizer**: SGD
121
  - **Gradient Clipping**: 3.0
122
  - **Scheduler**: Cosine with hard restarts
 
123
 
124
  ## Performance
125
 
126
- **TODO** RAID Leaderboard should be here
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  <h1 align="center">SuperAnnotate</h1>
25
  <h3 align="center">
26
+ AI Detector<br/>
27
  Fine-Tuned RoBERTa Large<br/>
28
  </h3>
29
 
 
48
 
49
  ### Training Data
50
 
51
+ The training dataset for this version includes **44k pairs of text-label samples**, split equally between two parts:
52
 
53
+ 1. **Custom Generation**: The first half of the dataset was generated using custom specially designed prompts and human version sourced from three domains:
54
+ - [**Wikipedia**](https://huggingface.co/datasets/wikimedia/wikipedia)
55
+ - [**Reddit ELI5 QA**](https://huggingface.co/datasets/rexarski/eli5_category)
56
+ - [**Scientific Papers**](https://www.tensorflow.org/datasets/catalog/scientific_papers) (extended to include the full text of sections).
57
+
58
+ Texts were generated by 14 different models across four major LLM families (GPT, LLaMA, Anthropic, and Mistral). Each sample consists of a single prompt paired with one human-written and one generated response, though prompts were excluded from training inputs.
59
 
60
+ 2. **RAID Train Data Stratified Subset**: The second half is a carefully selected stratified subset from the RAID train dataset, ensuring equal representation across domains, model types, and attack methods. Each example pairs a human-authored text with a corresponding machine-generated response (produced by a single model with specific parameters and attacks applied).
 
61
 
62
+ This balanced dataset structure maintains approximately equal proportions of human and generated text samples, ensuring that each prompt aligns with one authentic and one generated answer.
 
 
63
 
64
  > [!NOTE]
65
  > Furthermore, key n-grams (n ranging from 2 to 5) that exhibited the highest correlation with target labels were identified and subsequently removed from the training data utilizing the chi-squared test.
 
67
  ### Peculiarity
68
 
69
  During training, one of the priorities was not only maximizing the quality of predictions but also avoiding overfitting and obtaining an adequately confident predictor. \
70
+ We are pleased to achieve the following state of model calibration and high acccuracy prediction.
 
 
71
 
72
  ## Usage
73
 
 
81
  import torch.nn.functional as F
82
 
83
 
84
+ model = RobertaClassifier.from_pretrained("SuperAnnotate/ai-detector")
85
+ tokenizer = AutoTokenizer.from_pretrained("SuperAnnotate/ai-detector")
86
 
87
  text_example = "It's not uncommon for people to develop allergies or intolerances to certain foods as they get older. It's possible that you have always had a sensitivity to lactose (the sugar found in milk and other dairy products), but it only recently became a problem for you. This can happen because our bodies can change over time and become more or less able to tolerate certain things. It's also possible that you have developed an allergy or intolerance to something else that is causing your symptoms, such as a food additive or preservative. In any case, it's important to talk to a doctor if you are experiencing new allergy or intolerance symptoms, so they can help determine the cause and recommend treatment."
88
 
 
112
  - **Base Model**: [FacebookAI/roberta-large](https://huggingface.co/FacebookAI/roberta-large)
113
  - **Epochs**: 20
114
  - **Learning Rate**: 5e-05
115
+ - **Weight Decay**: 0.0033
116
+ - **Label Smoothing**: 0.38
117
  - **Warmup Epochs**: 2
118
  - **Optimizer**: SGD
119
  - **Gradient Clipping**: 3.0
120
  - **Scheduler**: Cosine with hard restarts
121
+ - **Number Scheduler Cycles**: 6
122
 
123
  ## Performance
124
 
125
+ This solution has been validated on strytify subset from [RAID](https://raid-bench.xyz/) train dataset. \
126
+ This benchmark, which includes a diverse dataset covering:
127
+ - 11 LLM models
128
+ - 11 adversarial attacks
129
+ - 8 domains
130
+
131
+ The performance of detector
132
+
133
+ | Model | Accuracy |
134
+ |---------------|----------|
135
+ | ***Human*** | 0.731 |
136
+ | ChatGPT | 0.992 |
137
+ | GPT-2 | 0.649 |
138
+ | GPT-3 | 0.945 |
139
+ | GPT-4 | 0.985 |
140
+ | LLaMA-Chat | 0.980 |
141
+ | Mistral | 0.644 |
142
+ | Mistral-Chat | 0.975 |
143
+ | Cohere | 0.823 |
144
+ | Cohere-Chat | 0.906 |
145
+ | MPT | 0.757 |
146
+ | MPT-Chat | 0.943 |
147
+ | Average |**0.852** |