Alex Tyshka commited on
Commit
d5b293a
·
1 Parent(s): e5cadbb

Add documentation and notebook

Browse files
Files changed (3) hide show
  1. README.md +64 -1
  2. app.py +2 -1
  3. main.ipynb +0 -0
README.md CHANGED
@@ -10,4 +10,67 @@ pinned: false
10
  license: gpl-3.0
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: gpl-3.0
11
  ---
12
 
13
+ # Introduction
14
+ This project presents a tool that predicts whether a given text is generated by a large language model (LLM) such as ChatGPT.
15
+ To do this, a machine learning model analyzes patterns in the words and sentence structure that are typical for humans or LLM systems.
16
+ The system outputs a prediction with confidences, and shows the factors that led to its decision.
17
+ These factors are output in the form of percentiles based on text the model has seen before.
18
+ This tool is not 100% accurate and can incorrectly flag texts as human or AI written when they are not, so it should not be used as a sole measure for cheating detection.
19
+
20
+ # Usage
21
+ To run this project, visit https://huggingface.co/spaces/atyshka/ai-detector. The Interface allows you to enter any text and click "Submit" to evaluate it.
22
+ On the right, you will see 3 outputs:
23
+
24
+ First, you will see the prediction from the model (Human or AI) and the confidence score.
25
+ Next, you will see the factors that contributed to this decision.
26
+ The percentages do not represent how much each feature contributed to the decision, rather, they are percentiles indicating whether the feature is high or low compared to other text the model has seen.
27
+ For example, a perplexity of 95% indicates the perplexity (i.e. usage of rare words) is very high, while 5% indicates very low perplexity.
28
+ Finally, you will see a visualization of the perplexity, where words are highlighted according to their "rareness".
29
+
30
+ At the bottom of the page, there are 4 examples.
31
+ Examples 1 & 3 are written by ChatGPT, while 2 & 4 are human-written.
32
+
33
+ Feel free to modify these examples or generate your own samples from scratch to see how the model scores change.
34
+
35
+ # Documentation
36
+ The heart of this project is a logistic regression model that uses features from the text to predict whether the text is written by an LLM.
37
+ This model uses 4 input features: perplexity, mean sentence length, the standard deviation of sentence length, and the Shapiro-Wilk p-value for sentence length.
38
+ Perplexity is measured as the negative log likelihood of the sequence, while the Shapiro-Wilk p-value measures how well the sentence lengths fit a normal distribution.
39
+ I also experimented with using a multilayer perceptron, but the F1 score improved by only 1 point (.93 -> .94).
40
+ Given the better interpretability, I decided to keep the logistic regression approach for the final product.
41
+
42
+ For calculating perplexity, the system uses a GPT2-large model, which is an autoregressive decoder-only transformer with 774M parameters.
43
+ This model was trained by OpenAI on the WebText dataset, consisting of outbound links from Reddit.
44
+ OpenAI did not share the training time, epochs, or other specifics of the training procedure, simply noting that learning rate was manually tuned.
45
+ GPT-2 can contain many biases and factual inaccuracies, however, given the model is not used generatively for this project these problems are fairly irrelevant.
46
+
47
+ For training this model, I use the GPT Wiki Intro dataset, which consists of Wikipedia intro paragraphs written by GPT-3.
48
+ The Curie model variant is used to generate intro paragraphs given the title of the article and the first 7 words as a prompt.
49
+ 150k topics are present in the dataset, the creators did not share how these topics were selected.
50
+ Each LLM-written paragraph is paired with the human written version; having paired examples from the same domain is why I chose this dataset.
51
+ For computational efficiency, I use only 4000 topics for training and 1000 for testing.
52
+
53
+ The model uses Scikit Learn for the logistic regression model, Huggingface/Pytorch for the GPT-2 model, and Gradio for the user interface.
54
+ NLTK and Scipy are also used for calculating the features for the model input.
55
+
56
+ # Contributions
57
+ A perplexity-based LLM detector is not a new idea, with many solutions such as GPT-Zero being published in the wake of ChatGPT.
58
+ However, most of the popular solutions are not open-source.
59
+ The initial design was based on Huggingface's perplexity example (https://huggingface.co/docs/transformers/perplexity) but I needed quite a bit of modification to obtain word-level perplexities and then map subword tokens back to full words.
60
+ Then I added the other statistics on the distribution of sentence lengths and evaluated two types of classifiers.
61
+ I also planned to add synonym-frequency usage with WordNet as a feature, however, there was not sufficient time to implement this.
62
+
63
+ From the user side, I decided to focus on interpretability of the results, showing how the 4 features contributed to the final result.
64
+ I also added perplexity visualization to help non-experts understand what the model is paying attention to.
65
+ I hope that this added interpretability makes the model less of a black box for users.
66
+
67
+ # Limitations
68
+ The LLM detector does not work well with short spans of text, as there is not sufficient data to make a strong inference.
69
+ It also works best in the original domain of Wikipedia generation, as other domains such as fiction may contain different distributions of word length and perplexity.
70
+ Text that an LLM has paraphrased from a vocabulary-rich sample is particularly hard to detect, because the model will reuse these high-perplexity words.
71
+ In general, longer prompts fool the classifier more, because they singificantly alter the word distribution.
72
+ I also suspect that state-of-the-art approaches using GPT 3.5/4 or PaLM may be harder to detect than the much simpler Curie GPT-3 model.
73
+ Ideally, I would use these models for training my classifier, however, this would incur significant expense.
74
+ Finally, the classifier can be tricked with some simple modification of LLM-written text, by adding rare words or long/short sentences.
75
+
76
+ Given these limitations, I would emphasize that this detector serves as only a loose guide and should not be used for cheating detection.
app.py CHANGED
@@ -78,7 +78,8 @@ Just then, a young man walked into the dining room and greeted the English guest
78
  “Buongiorno,” he said, “my name is George Emerson. I couldn’t help but notice that you were disappointed with your rooms. If you’d like, I could switch with you. My mother and I are in south rooms, and we’d be happy to take the north ones.”"""
79
 
80
  description = """This Space can be used to measure the likelihood of a text being generated by an LLM like ChatGPT.
81
- In general, human written text has higher perplexity, sentence length, and length variation than AI generated text, with lower length normality."""
 
82
 
83
 
84
  demo = gr.Interface(fn=score_text,
 
78
  “Buongiorno,” he said, “my name is George Emerson. I couldn’t help but notice that you were disappointed with your rooms. If you’d like, I could switch with you. My mother and I are in south rooms, and we’d be happy to take the north ones.”"""
79
 
80
  description = """This Space can be used to measure the likelihood of a text being generated by an LLM like ChatGPT.
81
+ In general, human written text has higher perplexity, sentence length, and length variation than AI generated text, with lower length normality.
82
+ Perplexity is a measure of how often uncommon words appear in the text."""
83
 
84
 
85
  demo = gr.Interface(fn=score_text,
main.ipynb ADDED
The diff for this file is too large to render. See raw diff