File size: 186,616 Bytes
48d9f56
1
{"metadata":{"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"datalore":{"computation_mode":"JUPYTER","package_manager":"pip","base_environment":"default","packages":[],"report_row_ids":[],"version":3},"language_info":{"name":"python","version":"3.10.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"nvidiaTeslaT4","dataSources":[{"sourceId":61542,"databundleVersionId":7516023,"sourceType":"competition"},{"sourceId":2468672,"sourceType":"datasetVersion","datasetId":1455358},{"sourceId":6977472,"sourceType":"datasetVersion","datasetId":4005256}],"dockerImageVersionId":30648,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":true}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# LLM - Detect AI Generated Text","metadata":{"datalore":{"node_id":"Pa2yxouvTjqcDsgCYUzPjr","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"markdown","source":"This competition challenges participants to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM. The competition dataset comprises a mix of student-written essays and essays generated by a variety of LLMs.","metadata":{}},{"cell_type":"markdown","source":"**Evaluation:**\nSubmissions are evaluated on area under the ROC curve between the predicted probability and the observed target.","metadata":{}},{"cell_type":"markdown","source":"## 1. Import libraries","metadata":{"datalore":{"node_id":"ek6n1GGlnkNFGIJ0DB0C1g","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"!pip install datasets==2.15","metadata":{"datalore":{"node_id":"gLJbvMXVqhcKYPln9YBRIq","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:12:50.593215Z","iopub.execute_input":"2024-02-07T16:12:50.593867Z","iopub.status.idle":"2024-02-07T16:13:03.047070Z","shell.execute_reply.started":"2024-02-07T16:12:50.593836Z","shell.execute_reply":"2024-02-07T16:13:03.045794Z"},"trusted":true},"execution_count":44,"outputs":[{"name":"stderr","text":"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\nTo disable this warning, you can either:\n\t- Avoid using `tokenizers` before the fork if possible\n\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n","output_type":"stream"},{"name":"stdout","text":"Requirement already satisfied: datasets==2.15 in /opt/conda/lib/python3.10/site-packages (2.15.0)\nRequirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (1.24.4)\nRequirement already satisfied: pyarrow>=8.0.0 in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (11.0.0)\nRequirement already satisfied: pyarrow-hotfix in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (0.6)\nRequirement already satisfied: dill<0.3.8,>=0.3.0 in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (0.3.7)\nRequirement already satisfied: pandas in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (2.1.4)\nRequirement already satisfied: requests>=2.19.0 in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (2.31.0)\nRequirement already satisfied: tqdm>=4.62.1 in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (4.66.1)\nRequirement already satisfied: xxhash in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (3.4.1)\nRequirement already satisfied: multiprocess in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (0.70.15)\nRequirement already satisfied: fsspec<=2023.10.0,>=2023.1.0 in /opt/conda/lib/python3.10/site-packages (from fsspec[http]<=2023.10.0,>=2023.1.0->datasets==2.15) (2023.10.0)\nRequirement already satisfied: aiohttp in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (3.9.1)\nRequirement already satisfied: huggingface-hub>=0.18.0 in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (0.20.3)\nRequirement already satisfied: packaging in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (21.3)\nRequirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.10/site-packages (from datasets==2.15) (6.0.1)\nRequirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets==2.15) (23.2.0)\nRequirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets==2.15) (6.0.4)\nRequirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets==2.15) (1.9.3)\nRequirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets==2.15) (1.4.1)\nRequirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets==2.15) (1.3.1)\nRequirement already satisfied: async-timeout<5.0,>=4.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets==2.15) (4.0.3)\nRequirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from huggingface-hub>=0.18.0->datasets==2.15) (3.13.1)\nRequirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.10/site-packages (from huggingface-hub>=0.18.0->datasets==2.15) (4.9.0)\nRequirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.10/site-packages (from packaging->datasets==2.15) (3.1.1)\nRequirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests>=2.19.0->datasets==2.15) (3.3.2)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests>=2.19.0->datasets==2.15) (3.6)\nRequirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests>=2.19.0->datasets==2.15) (1.26.18)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests>=2.19.0->datasets==2.15) (2023.11.17)\nRequirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets==2.15) (2.8.2)\nRequirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets==2.15) (2023.3.post1)\nRequirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets==2.15) (2023.4)\nRequirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->datasets==2.15) (1.16.0)\n","output_type":"stream"}]},{"cell_type":"code","source":"import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\nimport re\n\nfrom transformers import PushToHubCallback\nfrom transformers import AutoTokenizer, DataCollatorWithPadding\nfrom transformers import TFAutoModelForSequenceClassification\nfrom tensorflow.keras.optimizers.schedules import PolynomialDecay\nfrom tensorflow.keras.optimizers import Adam\nimport tensorflow as tf\nfrom keras.callbacks import EarlyStopping\nimport datasets\n\nfrom sklearn.metrics import ConfusionMatrixDisplay\nfrom sklearn.metrics import classification_report, f1_score\n","metadata":{"datalore":{"node_id":"0tAJiErffFXiL5lrISxce7","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.049654Z","iopub.execute_input":"2024-02-07T16:13:03.049987Z","iopub.status.idle":"2024-02-07T16:13:03.059305Z","shell.execute_reply.started":"2024-02-07T16:13:03.049953Z","shell.execute_reply":"2024-02-07T16:13:03.058441Z"},"trusted":true},"execution_count":45,"outputs":[]},{"cell_type":"code","source":"from huggingface_hub import notebook_login\n\nnotebook_login()","metadata":{"execution":{"iopub.status.busy":"2024-02-07T16:13:03.060453Z","iopub.execute_input":"2024-02-07T16:13:03.060739Z","iopub.status.idle":"2024-02-07T16:13:03.097365Z","shell.execute_reply.started":"2024-02-07T16:13:03.060718Z","shell.execute_reply":"2024-02-07T16:13:03.096453Z"},"trusted":true},"execution_count":46,"outputs":[{"output_type":"display_data","data":{"text/plain":"VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"981bf8b1df664056a3bcd7cabcf5e55a"}},"metadata":{}}]},{"cell_type":"markdown","source":"## 2. Loading data","metadata":{"datalore":{"node_id":"f5wDDytoNShJY7UaEWvaWf","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"train_essays = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/train_essays.csv')\ntest_essays = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/test_essays.csv')\ntrain_prompts = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/train_prompts.csv')\nsample_submission = pd.read_csv('/kaggle/input/llm-detect-ai-generated-text/sample_submission.csv')","metadata":{"datalore":{"node_id":"5gijDcIM3hHmN4zzseqzyg","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.099503Z","iopub.execute_input":"2024-02-07T16:13:03.099771Z","iopub.status.idle":"2024-02-07T16:13:03.166248Z","shell.execute_reply.started":"2024-02-07T16:13:03.099749Z","shell.execute_reply":"2024-02-07T16:13:03.165491Z"},"trusted":true},"execution_count":47,"outputs":[]},{"cell_type":"code","source":"train_essays['text_len'] = train_essays['text'].apply(len)\ntrain_essays = train_essays.sort_values('text_len')\n","metadata":{"datalore":{"node_id":"rPOtNNxkbCjMHKpczt02A5","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.167214Z","iopub.execute_input":"2024-02-07T16:13:03.167466Z","iopub.status.idle":"2024-02-07T16:13:03.175550Z","shell.execute_reply.started":"2024-02-07T16:13:03.167444Z","shell.execute_reply":"2024-02-07T16:13:03.174645Z"},"trusted":true},"execution_count":48,"outputs":[]},{"cell_type":"code","source":"train_essays","metadata":{"datalore":{"node_id":"PriIabcV41XCCEl0mw9HTR","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.176815Z","iopub.execute_input":"2024-02-07T16:13:03.177126Z","iopub.status.idle":"2024-02-07T16:13:03.194813Z","shell.execute_reply.started":"2024-02-07T16:13:03.177083Z","shell.execute_reply":"2024-02-07T16:13:03.193744Z"},"trusted":true},"execution_count":49,"outputs":[{"execution_count":49,"output_type":"execute_result","data":{"text/plain":"            id  prompt_id                                               text  \\\n704   82131f68          1  This essay will analyze, discuss and prove one...   \n493   606ec542          0  I think limiting car usage is a great idea for...   \n1115  cbc48dd7          0  Zroom! Cars have been developing for hundreds ...   \n740   86fe4f18          1  I strongly believe that the Electoral College ...   \n1337  f81d371d          1  Dear, senator I believe the electoral college ...   \n...        ...        ...                                                ...   \n1082  c3e2e9e5          0  Driving is the primary way of transportation, ...   \n175   223bbf18          0  When limiting car usage the first thing that m...   \n326   40524218          0  As the global concern for the environment incr...   \n97    15f7ea58          1  Dear Senator, Concerning the topic of the meri...   \n821   947b8cca          1  To tohe stoatoe and tohe stoatoe's countory, t...   \n\n      generated  text_len  \n704           1      1356  \n493           0      1486  \n1115          0      1492  \n740           1      1500  \n1337          0      1595  \n...         ...       ...  \n1082          0      6957  \n175           0      7190  \n326           0      7373  \n97            0      8033  \n821           0      8436  \n\n[1378 rows x 5 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>prompt_id</th>\n      <th>text</th>\n      <th>generated</th>\n      <th>text_len</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>704</th>\n      <td>82131f68</td>\n      <td>1</td>\n      <td>This essay will analyze, discuss and prove one...</td>\n      <td>1</td>\n      <td>1356</td>\n    </tr>\n    <tr>\n      <th>493</th>\n      <td>606ec542</td>\n      <td>0</td>\n      <td>I think limiting car usage is a great idea for...</td>\n      <td>0</td>\n      <td>1486</td>\n    </tr>\n    <tr>\n      <th>1115</th>\n      <td>cbc48dd7</td>\n      <td>0</td>\n      <td>Zroom! Cars have been developing for hundreds ...</td>\n      <td>0</td>\n      <td>1492</td>\n    </tr>\n    <tr>\n      <th>740</th>\n      <td>86fe4f18</td>\n      <td>1</td>\n      <td>I strongly believe that the Electoral College ...</td>\n      <td>1</td>\n      <td>1500</td>\n    </tr>\n    <tr>\n      <th>1337</th>\n      <td>f81d371d</td>\n      <td>1</td>\n      <td>Dear, senator I believe the electoral college ...</td>\n      <td>0</td>\n      <td>1595</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>1082</th>\n      <td>c3e2e9e5</td>\n      <td>0</td>\n      <td>Driving is the primary way of transportation, ...</td>\n      <td>0</td>\n      <td>6957</td>\n    </tr>\n    <tr>\n      <th>175</th>\n      <td>223bbf18</td>\n      <td>0</td>\n      <td>When limiting car usage the first thing that m...</td>\n      <td>0</td>\n      <td>7190</td>\n    </tr>\n    <tr>\n      <th>326</th>\n      <td>40524218</td>\n      <td>0</td>\n      <td>As the global concern for the environment incr...</td>\n      <td>0</td>\n      <td>7373</td>\n    </tr>\n    <tr>\n      <th>97</th>\n      <td>15f7ea58</td>\n      <td>1</td>\n      <td>Dear Senator, Concerning the topic of the meri...</td>\n      <td>0</td>\n      <td>8033</td>\n    </tr>\n    <tr>\n      <th>821</th>\n      <td>947b8cca</td>\n      <td>1</td>\n      <td>To tohe stoatoe and tohe stoatoe's countory, t...</td>\n      <td>0</td>\n      <td>8436</td>\n    </tr>\n  </tbody>\n</table>\n<p>1378 rows × 5 columns</p>\n</div>"},"metadata":{}}]},{"cell_type":"code","source":"train_essays['text_len'].describe()","metadata":{"datalore":{"node_id":"BpbeG8fdc0uRS2KG0Apoht","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.196036Z","iopub.execute_input":"2024-02-07T16:13:03.196404Z","iopub.status.idle":"2024-02-07T16:13:03.210475Z","shell.execute_reply.started":"2024-02-07T16:13:03.196371Z","shell.execute_reply":"2024-02-07T16:13:03.209616Z"},"trusted":true},"execution_count":50,"outputs":[{"execution_count":50,"output_type":"execute_result","data":{"text/plain":"count    1378.000000\nmean     3169.050798\nstd       920.588198\nmin      1356.000000\n25%      2554.250000\n50%      2985.500000\n75%      3623.750000\nmax      8436.000000\nName: text_len, dtype: float64"},"metadata":{}}]},{"cell_type":"code","source":"ax = sns.countplot(data=train_essays, x='generated')\nax.bar_label(ax.containers[0])\nplt.title('Distribution of texts');","metadata":{"datalore":{"node_id":"1j0urtcgAun9NZKxouvwdA","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.211689Z","iopub.execute_input":"2024-02-07T16:13:03.212001Z","iopub.status.idle":"2024-02-07T16:13:03.413478Z","shell.execute_reply.started":"2024-02-07T16:13:03.211978Z","shell.execute_reply":"2024-02-07T16:13:03.412536Z"},"trusted":true},"execution_count":51,"outputs":[{"output_type":"display_data","data":{"text/plain":"<Figure size 640x480 with 1 Axes>","image/png":""},"metadata":{}}]},{"cell_type":"code","source":"generated_essays = train_essays[train_essays['generated'] == 1]\ngenerated_essays","metadata":{"datalore":{"node_id":"PoYwJW5KtkraGRQiCmLFge","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.414733Z","iopub.execute_input":"2024-02-07T16:13:03.415006Z","iopub.status.idle":"2024-02-07T16:13:03.425880Z","shell.execute_reply.started":"2024-02-07T16:13:03.414982Z","shell.execute_reply":"2024-02-07T16:13:03.424958Z"},"trusted":true},"execution_count":52,"outputs":[{"execution_count":52,"output_type":"execute_result","data":{"text/plain":"            id  prompt_id                                               text  \\\n704   82131f68          1  This essay will analyze, discuss and prove one...   \n740   86fe4f18          1  I strongly believe that the Electoral College ...   \n1262  eafb8a56          0  Limiting car use causes pollution, increases c...   \n\n      generated  text_len  \n704           1      1356  \n740           1      1500  \n1262          1      1797  ","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>prompt_id</th>\n      <th>text</th>\n      <th>generated</th>\n      <th>text_len</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>704</th>\n      <td>82131f68</td>\n      <td>1</td>\n      <td>This essay will analyze, discuss and prove one...</td>\n      <td>1</td>\n      <td>1356</td>\n    </tr>\n    <tr>\n      <th>740</th>\n      <td>86fe4f18</td>\n      <td>1</td>\n      <td>I strongly believe that the Electoral College ...</td>\n      <td>1</td>\n      <td>1500</td>\n    </tr>\n    <tr>\n      <th>1262</th>\n      <td>eafb8a56</td>\n      <td>0</td>\n      <td>Limiting car use causes pollution, increases c...</td>\n      <td>1</td>\n      <td>1797</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"code","source":"ax2 = sns.countplot(data=train_essays, x='prompt_id')\nax2.bar_label(ax2.containers[0])\nplt.title('Distribution of prompts');","metadata":{"datalore":{"node_id":"ylPh4tN2nxXxzTwLkufmz4","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.430932Z","iopub.execute_input":"2024-02-07T16:13:03.431331Z","iopub.status.idle":"2024-02-07T16:13:03.627755Z","shell.execute_reply.started":"2024-02-07T16:13:03.431287Z","shell.execute_reply":"2024-02-07T16:13:03.626923Z"},"trusted":true},"execution_count":53,"outputs":[{"output_type":"display_data","data":{"text/plain":"<Figure size 640x480 with 1 Axes>","image/png":""},"metadata":{}}]},{"cell_type":"markdown","source":"### Exploring test_essays","metadata":{"datalore":{"node_id":"qn2ISuRTmxMZ27LkJpEDDM","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"test_essays","metadata":{"datalore":{"node_id":"swH8B4pdgpa0wOVoBhfbal","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.628834Z","iopub.execute_input":"2024-02-07T16:13:03.629176Z","iopub.status.idle":"2024-02-07T16:13:03.638296Z","shell.execute_reply.started":"2024-02-07T16:13:03.629143Z","shell.execute_reply":"2024-02-07T16:13:03.637369Z"},"trusted":true},"execution_count":54,"outputs":[{"execution_count":54,"output_type":"execute_result","data":{"text/plain":"         id  prompt_id          text\n0  0000aaaa          2  Aaa bbb ccc.\n1  1111bbbb          3  Bbb ccc ddd.\n2  2222cccc          4  CCC ddd eee.","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>prompt_id</th>\n      <th>text</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0000aaaa</td>\n      <td>2</td>\n      <td>Aaa bbb ccc.</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1111bbbb</td>\n      <td>3</td>\n      <td>Bbb ccc ddd.</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2222cccc</td>\n      <td>4</td>\n      <td>CCC ddd eee.</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"markdown","source":"There is only dummy data in test_essays, after submission the text will be replaced with real text.","metadata":{"datalore":{"node_id":"eCoBMveMWE8eAs95Z2g9oA","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"markdown","source":"### Exploring Train Prompts data","metadata":{"datalore":{"node_id":"urZA7FLe9NfAUqNoKjObON","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"train_prompts","metadata":{"datalore":{"node_id":"jUDH4BTi5z6rvr59VFk343","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.639476Z","iopub.execute_input":"2024-02-07T16:13:03.639753Z","iopub.status.idle":"2024-02-07T16:13:03.654284Z","shell.execute_reply.started":"2024-02-07T16:13:03.639730Z","shell.execute_reply":"2024-02-07T16:13:03.653356Z"},"trusted":true},"execution_count":55,"outputs":[{"execution_count":55,"output_type":"execute_result","data":{"text/plain":"   prompt_id                       prompt_name  \\\n0          0                   Car-free cities   \n1          1  Does the electoral college work?   \n\n                                        instructions  \\\n0  Write an explanatory essay to inform fellow ci...   \n1  Write a letter to your state senator in which ...   \n\n                                         source_text  \n0  # In German Suburb, Life Goes On Without Cars ...  \n1  # What Is the Electoral College? by the Office...  ","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>prompt_id</th>\n      <th>prompt_name</th>\n      <th>instructions</th>\n      <th>source_text</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n      <td>Car-free cities</td>\n      <td>Write an explanatory essay to inform fellow ci...</td>\n      <td># In German Suburb, Life Goes On Without Cars ...</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n      <td>Does the electoral college work?</td>\n      <td>Write a letter to your state senator in which ...</td>\n      <td># What Is the Electoral College? by the Office...</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"code","source":"train_prompts.iloc[0]['instructions']","metadata":{"datalore":{"node_id":"IhBtxcZceIE7zNVgxyRZml","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.655312Z","iopub.execute_input":"2024-02-07T16:13:03.655624Z","iopub.status.idle":"2024-02-07T16:13:03.665807Z","shell.execute_reply.started":"2024-02-07T16:13:03.655594Z","shell.execute_reply":"2024-02-07T16:13:03.664903Z"},"trusted":true},"execution_count":56,"outputs":[{"execution_count":56,"output_type":"execute_result","data":{"text/plain":"'Write an explanatory essay to inform fellow citizens about the advantages of limiting car usage. Your essay must be based on ideas and information that can be found in the passage set. Manage your time carefully so that you can read the passages; plan your response; write your response; and revise and edit your response. Be sure to use evidence from multiple sources; and avoid overly relying on one source. Your response should be in the form of a multiparagraph essay. Write your essay in the space provided.'"},"metadata":{}}]},{"cell_type":"markdown","source":"**Prompt_id = 0**\\\n**prompt_name = Car-free cities**\\\n'Write an explanatory essay to inform fellow citizens about the advantages of limiting car usage. Your essay must be based on ideas and information that can be found in the passage set. Manage your time carefully so that you can read the passages; plan your response; write your response; and revise and edit your response. Be sure to use evidence from multiple sources; and avoid overly relying on one source. Your response should be in the form of a multiparagraph essay. Write your essay in the space provided.'","metadata":{"datalore":{"node_id":"YBADwMdIO4pNpI8MUskJT7","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"train_prompts.iloc[1]['instructions']","metadata":{"datalore":{"node_id":"slJneLtnk2LCqrwBUMShys","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.666700Z","iopub.execute_input":"2024-02-07T16:13:03.666924Z","iopub.status.idle":"2024-02-07T16:13:03.676438Z","shell.execute_reply.started":"2024-02-07T16:13:03.666904Z","shell.execute_reply":"2024-02-07T16:13:03.675611Z"},"trusted":true},"execution_count":57,"outputs":[{"execution_count":57,"output_type":"execute_result","data":{"text/plain":"'Write a letter to your state senator in which you argue in favor of keeping the Electoral College or changing to election by popular vote for the president of the United States. Use the information from the texts in your essay. Manage your time carefully so that you can read the passages; plan your response; write your response; and revise and edit your response. Be sure to include a claim; address counterclaims; use evidence from multiple sources; and avoid overly relying on one source. Your response should be in the form of a multiparagraph essay. Write your response in the space provided.'"},"metadata":{}}]},{"cell_type":"markdown","source":"**Prompt_id = 1**\\\n**prompt_name = Does the electoral college work?**\\\n'Write a letter to your state senator in which you argue in favor of keeping the Electoral College or changing to election by popular vote for the president of the United States. Use the information from the texts in your essay. Manage your time carefully so that you can read the passages; plan your response; write your response; and revise and edit your response. Be sure to include a claim; address counterclaims; use evidence from multiple sources; and avoid overly relying on one source. Your response should be in the form of a multiparagraph essay. Write your response in the space provided.'","metadata":{"datalore":{"node_id":"KFNcrRCeP8O0rBaXUUMvFi","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"markdown","source":"## 3. Loading external dataset","metadata":{"datalore":{"node_id":"jD4ZTKf0EQNPKEX9ic8a5l","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"markdown","source":"Since there are only 3 AI generated essays, I need extra dataset with AI generated text.\\\nLuckily there is such data on Kaggle.","metadata":{"datalore":{"node_id":"Yy2tKqV4xjtf2WiwpLJNx5","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"external_essays = pd.read_csv('/kaggle/input/daigt-v2-train-dataset/train_v2_drcat_02.csv')","metadata":{"datalore":{"node_id":"IoEuwVB0cfQC5DSC41JG0V","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:03.677497Z","iopub.execute_input":"2024-02-07T16:13:03.677760Z","iopub.status.idle":"2024-02-07T16:13:04.747705Z","shell.execute_reply.started":"2024-02-07T16:13:03.677738Z","shell.execute_reply":"2024-02-07T16:13:04.746628Z"},"trusted":true},"execution_count":58,"outputs":[]},{"cell_type":"code","source":"external_essays.head(10)","metadata":{"datalore":{"node_id":"ZNPA0op16zbCTzBLwFuwmD","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:04.748974Z","iopub.execute_input":"2024-02-07T16:13:04.749308Z","iopub.status.idle":"2024-02-07T16:13:04.761456Z","shell.execute_reply.started":"2024-02-07T16:13:04.749281Z","shell.execute_reply":"2024-02-07T16:13:04.760478Z"},"trusted":true},"execution_count":59,"outputs":[{"execution_count":59,"output_type":"execute_result","data":{"text/plain":"                                                text  label  \\\n0  Phones\\n\\nModern humans today are always on th...      0   \n1  This essay will explain if drivers should or s...      0   \n2  Driving while the use of cellular devices\\n\\nT...      0   \n3  Phones & Driving\\n\\nDrivers should not be able...      0   \n4  Cell Phone Operation While Driving\\n\\nThe abil...      0   \n5  Cell phone use should not be legal while drivi...      0   \n6  Phones and Driving\\n\\nDriving is a good way to...      0   \n7  PHONES AND DRIVING\\n\\nIn this world in which w...      0   \n8  People are debating whether if drivers should ...      0   \n9  Texting and driving\\n\\nOver half of drivers in...      0   \n\n          prompt_name           source  RDizzl3_seven  \n0  Phones and driving  persuade_corpus          False  \n1  Phones and driving  persuade_corpus          False  \n2  Phones and driving  persuade_corpus          False  \n3  Phones and driving  persuade_corpus          False  \n4  Phones and driving  persuade_corpus          False  \n5  Phones and driving  persuade_corpus          False  \n6  Phones and driving  persuade_corpus          False  \n7  Phones and driving  persuade_corpus          False  \n8  Phones and driving  persuade_corpus          False  \n9  Phones and driving  persuade_corpus          False  ","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>text</th>\n      <th>label</th>\n      <th>prompt_name</th>\n      <th>source</th>\n      <th>RDizzl3_seven</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Phones\\n\\nModern humans today are always on th...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>This essay will explain if drivers should or s...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Driving while the use of cellular devices\\n\\nT...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Phones &amp; Driving\\n\\nDrivers should not be able...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Cell Phone Operation While Driving\\n\\nThe abil...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Cell phone use should not be legal while drivi...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>Phones and Driving\\n\\nDriving is a good way to...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>PHONES AND DRIVING\\n\\nIn this world in which w...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>People are debating whether if drivers should ...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>Texting and driving\\n\\nOver half of drivers in...</td>\n      <td>0</td>\n      <td>Phones and driving</td>\n      <td>persuade_corpus</td>\n      <td>False</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"code","source":"external_essays['text_len'] = external_essays['text'].apply(len)\nexternal_essays = external_essays.sort_values('text_len')\nexternal_essays","metadata":{"datalore":{"node_id":"HUEkPkK7SPKGYXJXY3XqOc","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:04.762834Z","iopub.execute_input":"2024-02-07T16:13:04.763088Z","iopub.status.idle":"2024-02-07T16:13:04.815853Z","shell.execute_reply.started":"2024-02-07T16:13:04.763066Z","shell.execute_reply":"2024-02-07T16:13:04.814908Z"},"trusted":true},"execution_count":60,"outputs":[{"execution_count":60,"output_type":"execute_result","data":{"text/plain":"                                                    text  label  \\\n41204    In recent years, there has been a growing trend      1   \n40767   Dear Senator,\\n\\nI am writing in support of k...      1   \n41168   Car usage has long been a significant factor ...      1   \n41167   Limiting car usage is a concept that has gain...      1   \n34960  Passage 1:\\n\\nPassage 2:\\n\\nPassage 3:\\n\\nPass...      1   \n...                                                  ...    ...   \n8895   The author did not do a good job at supporting...      0   \n19290  Dear Senator,\\n\\nI favoring of keeping the Ele...      0   \n1772   This passage is about a germany mom from the s...      0   \n2549   Imagen the streets with no cars empty with onl...      0   \n1517   if we look back at time in the united states y...      0   \n\n                            prompt_name                              source  \\\n41204                 Distance learning  mistralai/Mistral-7B-Instruct-v0.1   \n40767  Does the electoral college work?  mistralai/Mistral-7B-Instruct-v0.1   \n41168                   Car-free cities  mistralai/Mistral-7B-Instruct-v0.1   \n41167                   Car-free cities  mistralai/Mistral-7B-Instruct-v0.1   \n34960         Seeking multiple opinions                      falcon_180b_v1   \n...                                 ...                                 ...   \n8895                    Exploring Venus                     persuade_corpus   \n19290  Does the electoral college work?                     persuade_corpus   \n1772                    Car-free cities                     persuade_corpus   \n2549                    Car-free cities                     persuade_corpus   \n1517                    Car-free cities                     persuade_corpus   \n\n       RDizzl3_seven  text_len  \n41204          False        48  \n40767           True       272  \n41168           True       273  \n41167           True       304  \n34960          False       314  \n...              ...       ...  \n8895            True      9980  \n19290           True     10309  \n1772            True     11641  \n2549            True     18125  \n1517            True     18322  \n\n[44868 rows x 6 columns]","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>text</th>\n      <th>label</th>\n      <th>prompt_name</th>\n      <th>source</th>\n      <th>RDizzl3_seven</th>\n      <th>text_len</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>41204</th>\n      <td>In recent years, there has been a growing trend</td>\n      <td>1</td>\n      <td>Distance learning</td>\n      <td>mistralai/Mistral-7B-Instruct-v0.1</td>\n      <td>False</td>\n      <td>48</td>\n    </tr>\n    <tr>\n      <th>40767</th>\n      <td>Dear Senator,\\n\\nI am writing in support of k...</td>\n      <td>1</td>\n      <td>Does the electoral college work?</td>\n      <td>mistralai/Mistral-7B-Instruct-v0.1</td>\n      <td>True</td>\n      <td>272</td>\n    </tr>\n    <tr>\n      <th>41168</th>\n      <td>Car usage has long been a significant factor ...</td>\n      <td>1</td>\n      <td>Car-free cities</td>\n      <td>mistralai/Mistral-7B-Instruct-v0.1</td>\n      <td>True</td>\n      <td>273</td>\n    </tr>\n    <tr>\n      <th>41167</th>\n      <td>Limiting car usage is a concept that has gain...</td>\n      <td>1</td>\n      <td>Car-free cities</td>\n      <td>mistralai/Mistral-7B-Instruct-v0.1</td>\n      <td>True</td>\n      <td>304</td>\n    </tr>\n    <tr>\n      <th>34960</th>\n      <td>Passage 1:\\n\\nPassage 2:\\n\\nPassage 3:\\n\\nPass...</td>\n      <td>1</td>\n      <td>Seeking multiple opinions</td>\n      <td>falcon_180b_v1</td>\n      <td>False</td>\n      <td>314</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>8895</th>\n      <td>The author did not do a good job at supporting...</td>\n      <td>0</td>\n      <td>Exploring Venus</td>\n      <td>persuade_corpus</td>\n      <td>True</td>\n      <td>9980</td>\n    </tr>\n    <tr>\n      <th>19290</th>\n      <td>Dear Senator,\\n\\nI favoring of keeping the Ele...</td>\n      <td>0</td>\n      <td>Does the electoral college work?</td>\n      <td>persuade_corpus</td>\n      <td>True</td>\n      <td>10309</td>\n    </tr>\n    <tr>\n      <th>1772</th>\n      <td>This passage is about a germany mom from the s...</td>\n      <td>0</td>\n      <td>Car-free cities</td>\n      <td>persuade_corpus</td>\n      <td>True</td>\n      <td>11641</td>\n    </tr>\n    <tr>\n      <th>2549</th>\n      <td>Imagen the streets with no cars empty with onl...</td>\n      <td>0</td>\n      <td>Car-free cities</td>\n      <td>persuade_corpus</td>\n      <td>True</td>\n      <td>18125</td>\n    </tr>\n    <tr>\n      <th>1517</th>\n      <td>if we look back at time in the united states y...</td>\n      <td>0</td>\n      <td>Car-free cities</td>\n      <td>persuade_corpus</td>\n      <td>True</td>\n      <td>18322</td>\n    </tr>\n  </tbody>\n</table>\n<p>44868 rows × 6 columns</p>\n</div>"},"metadata":{}}]},{"cell_type":"code","source":"# Unique values in the columns\ncols_unique = ['label', 'prompt_name', 'source','RDizzl3_seven']\nfor col in cols_unique:\n    print(external_essays[col].unique())","metadata":{"datalore":{"node_id":"jXqzMwojHCEJxPBqjoKLAw","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:04.816862Z","iopub.execute_input":"2024-02-07T16:13:04.817150Z","iopub.status.idle":"2024-02-07T16:13:04.831336Z","shell.execute_reply.started":"2024-02-07T16:13:04.817102Z","shell.execute_reply":"2024-02-07T16:13:04.830016Z"},"trusted":true},"execution_count":61,"outputs":[{"name":"stdout","text":"[1 0]\n['Distance learning' 'Does the electoral college work?' 'Car-free cities'\n 'Seeking multiple opinions' 'Summer projects'\n 'Facial action coding system' 'Mandatory extracurricular activities'\n 'Grades for extracurricular activities' '\"A Cowboy Who Rode the Waves\"'\n 'Cell phones at school' 'Community service' 'Exploring Venus'\n 'Driverless cars' 'The Face on Mars' 'Phones and driving']\n['mistralai/Mistral-7B-Instruct-v0.1' 'falcon_180b_v1' 'chat_gpt_moth'\n 'mistral7binstruct_v2' 'llama2_chat' 'persuade_corpus'\n 'mistral7binstruct_v1' 'cohere-command' 'llama_70b_v1' 'palm-text-bison1'\n 'kingki19_palm' 'darragh_claude_v7' 'darragh_claude_v6' 'train_essays'\n 'NousResearch/Llama-2-7b-chat-hf' 'radek_500' 'radekgpt4']\n[False  True]\n","output_type":"stream"}]},{"cell_type":"code","source":"ax3 = sns.countplot(data=external_essays, x='label')\nax3.bar_label(ax3.containers[0])\nplt.title('Distribution of texts on external_essays');","metadata":{"datalore":{"node_id":"wLOfDfmnoYWEKy9skp1mZ8","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:04.834434Z","iopub.execute_input":"2024-02-07T16:13:04.834700Z","iopub.status.idle":"2024-02-07T16:13:05.017631Z","shell.execute_reply.started":"2024-02-07T16:13:04.834677Z","shell.execute_reply":"2024-02-07T16:13:05.016715Z"},"trusted":true},"execution_count":62,"outputs":[{"output_type":"display_data","data":{"text/plain":"<Figure size 640x480 with 1 Axes>","image/png":"iVBORw0KGgoAAAANSUhEUgAAAk0AAAHHCAYAAACiOWx7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8WgzjOAAAACXBIWXMAAA9hAAAPYQGoP6dpAABEBklEQVR4nO3deVxWdf7//+cFyuICSLJI4l65L6EhlTsjLtk42aLjJLg1FlhKo2YLbs2YlrmnLSpmOrk0aqO5hYpTYqUOuZSmjql9FLQUUFJEeP/+6Mv5eQnqEVEgH/fb7brl9T6v65zX+4Dy7Fzv6+AwxhgBAADgmlyKuwEAAIDSgNAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQhN+V0aNHy+Fw3JZjtW3bVm3btrWeb968WQ6HQ8uWLbstx4+KilKNGjVuy7EK69y5cxowYIACAwPlcDg0ZMiQ4m4JpVTe36/NmzcXdyu4gxGaUGLFx8fL4XBYDw8PDwUFBSkiIkLTpk3T2bNni+Q4x48f1+jRo5WcnFwk+ytKJbk3O/7xj38oPj5ezz77rBYsWKCnn376mrUrVqy45T199tlnGj169C0/Tkn33XffafTo0frxxx+LuxWg1CA0ocQbO3asFixYoFmzZmnw4MGSpCFDhqhRo0batWuXU+2rr76q8+fP39D+jx8/rjFjxtxwMFm/fr3Wr19/Q6+5Udfq7f3339f+/ftv6fFv1saNG9WyZUuNGjVKf/nLXxQSEnLV2tsZmsaMGXPLj1PSfffddxozZgyhCbgBZYq7AeB6OnfurObNm1vPR44cqY0bN+qRRx7Ro48+qu+//16enp6SpDJlyqhMmVv7bf3rr7+qXLlycnNzu6XHuZ6yZcsW6/HtOHnypOrXr1/cbeA2yszMVPny5Yu7DeCW4EoTSqX27dvrtdde05EjR/TRRx9Z4wWtadqwYYMefvhh+fj4qEKFCrrvvvv08ssvS/ptnUSLFi0kSX379rXeCoyPj5f027qlhg0baseOHWrdurXKlStnvfbKNU15cnJy9PLLLyswMFDly5fXo48+qmPHjjnV1KhRQ1FRUflee/k+r9dbQWuaMjMz9eKLLyo4OFju7u6677779NZbb8kY41TncDgUExOjFStWqGHDhnJ3d1eDBg20du3agk/4FU6ePKn+/fsrICBAHh4eatKkiebPn29tz1t/cvjwYa1evdrq/WpXNRwOhzIzMzV//nyr9vLz83//93/q16+fAgICrF7nzp1rbT9//rzq1q2runXrOl1pPH36tKpUqaIHH3xQOTk5ioqK0syZM61j5j3yfPzxxwoJCVHFihXl5eWlRo0aaerUqdc9H7frvGdlZWnUqFGqU6eO3N3dFRwcrOHDhysrK8uqiYyMlIeHh77//nun10ZERKhSpUo6fvy44uPj9cQTT0iS2rVrZ52Hy9cLrVmzRq1atVL58uVVsWJFde3aVXv37nXaZ1RUlCpUqKBDhw6pS5cuqlixonr37n1Dcz1y5Iiee+453XffffL09NRdd92lJ554osiugKWlpWnIkCHW16ZOnTqaMGGCcnNznequ97XPzs7WmDFjdM8998jDw0N33XWXHn74YW3YsMGq2bVrl6KiolSrVi15eHgoMDBQ/fr10y+//GLVbNq0SQ6HQ8uXL8/X66JFi+RwOJSUlCRJSklJUd++fVW1alW5u7urSpUq+uMf/8jVweJkgBJq3rx5RpL55ptvCtx+7NgxI8k8/vjj1tioUaPM5d/We/bsMW5ubqZ58+Zm6tSpZvbs2eZvf/ubad26tTHGmJSUFDN27FgjyTzzzDNmwYIFZsGCBebQoUPGGGPatGljAgMDjZ+fnxk8eLB59913zYoVK6xtbdq0sY61adMmI8k0atTING7c2Lz99tvmpZdeMh4eHubee+81v/76q1VbvXp1ExkZmW9Ol+/zer1FRkaa6tWrW6/Nzc017du3Nw6HwwwYMMDMmDHDdOvWzUgyQ4YMcTqOJNOkSRNTpUoVM27cODNlyhRTq1YtU65cOfPzzz9f8+vy66+/mnr16pmyZcuaoUOHmmnTpplWrVoZSWbKlClW7wsWLDCVK1c2TZs2tXo/d+5cgftcsGCBcXd3N61atbJqt27dau2ratWqJjg42IwdO9bMmjXLPProo0aSmTx5srWPbdu2GVdXVzN06FBrrGfPnsbT09Ps37/fGGPM1q1bzR/+8AcjyTrOggULjDHGrF+/3kgyHTp0MDNnzjQzZ840MTEx5oknnrjm+bhd5z0nJ8d07NjRlCtXzgwZMsS8++67JiYmxpQpU8b88Y9/tOrOnDljqlatalq0aGEuXbpkjDFm9uzZ1pyNMebQoUPm+eefN5LMyy+/bJ2HlJQUY4wxH374oXE4HKZTp05m+vTpZsKECaZGjRrGx8fHHD582DpWZGSkcXd3N7Vr1zaRkZFm9uzZ5sMPP7yhuS5dutQ0adLExMXFmffee8+8/PLLplKlSqZ69eomMzPTqsv7+7Vp06ZrnqfLZWZmmsaNG5u77rrLvPzyy2b27NmmT58+xuFwmBdeeMGqs/O1f/nll43D4TADBw4077//vpk0aZLp1auXeeONN6yat956y7Rq1cqMHTvWvPfee+aFF14wnp6e5oEHHjC5ubnGmN++X4KDg02PHj3y9dulSxdTu3Zt6/mDDz5ovL29zauvvmo++OAD849//MO0a9fOJCYm2j4HKFqEJpRY1wtNxhjj7e1tmjVrZj2/MjRNnjzZSDKnTp266j6++eYbI8nMmzcv37Y2bdoYSWb27NkFbisoNN19990mIyPDGl+yZImRZKZOnWqN2QlN1+vtytC0YsUKI8m8/vrrTnWPP/64cTgc5uDBg9aYJOPm5uY09u233xpJZvr06fmOdbkpU6YYSeajjz6yxi5evGjCwsJMhQoVnOZevXp107Vr12vuL0/58uULPCf9+/c3VapUyRcqevbsaby9vZ3C6MiRI42Li4vZsmWLWbp0qVOQyxMdHe30PZLnhRdeMF5eXlbQsOt2nfcFCxYYFxcX85///MdpPC8Qffnll9bYunXrrJ7+97//mQoVKpju3bs7vS7v/FwZQs6ePWt8fHzMwIEDncZTUlKMt7e303hkZKSRZF566aV8/dqd6+VfvzxJSUlGkhXAjClcaBo3bpwpX768+eGHH5zGX3rpJePq6mqOHj1qjLH3tW/SpMl1v5cLmss///lPI8ls2bLFGhs5cqRxd3c3aWlp1tjJkydNmTJlzKhRo4wxv4VfSebNN9+87jxx+/D2HEq1ChUqXPNTdD4+PpKklStX5rscb5e7u7v69u1ru75Pnz6qWLGi9fzxxx9XlSpV9NlnnxXq+HZ99tlncnV11fPPP+80/uKLL8oYozVr1jiNh4eHq3bt2tbzxo0by8vLS//73/+ue5zAwED16tXLGitbtqyef/55nTt3TomJiUUwm98YY/TJJ5+oW7duMsbo559/th4RERFKT0/Xzp07rfrRo0erQYMGioyM1HPPPac2bdrkOx9X4+Pjo8zMTKe3W+y4Xed96dKlqlevnurWret0Htq3by/pt7d98nTs2FF//etfNXbsWD322GPy8PDQu+++a2s+GzZsUFpamnr16uV0HFdXV4WGhjodJ8+zzz5b4L7szDVvPaL021tgv/zyi+rUqSMfHx+nr21hLF26VK1atVKlSpWc5hIeHq6cnBxt2bJFkr2vvY+Pj/bu3asDBw5ctebyuVy4cEE///yzWrZsKUlOc+nTp4+ysrKcbk+yePFiXbp0SX/5y1+sfbm5uWnz5s06c+ZM4U4AihyhCaXauXPnnALKlZ566ik99NBDGjBggAICAtSzZ08tWbLkhgLU3XfffUOLvu+55x6n5w6HQ3Xq1Lnl6xCOHDmioKCgfOejXr161vbLVatWLd8+KlWqdN1/oI8cOaJ77rlHLi7O/3xc7Tg349SpU0pLS9N7770nPz8/p0dekD158qRV7+bmprlz5+rw4cM6e/as5s2bZ/u+Xc8995zuvfdede7cWVWrVlW/fv1srTW6Xef9wIED2rt3b77zcO+990pyPg+S9NZbb8nX11fJycmaNm2a/P39rzuXvONIv60bvPJY69evz3ecMmXKqGrVqgXuy85cz58/r7i4OGvNUeXKleXn56e0tDSlp6fb6vlac1m7dm2+eYSHh0v6/8+Zna/92LFjlZaWpnvvvVeNGjXSsGHD8n169/Tp03rhhRcUEBAgT09P+fn5qWbNmpLkNJe6deuqRYsWWrhwoTW2cOFCtWzZUnXq1JH02/+sTZgwQWvWrFFAQIBat26tiRMnKiUl5abOCW4On55DqfXTTz8pPT3d+kemIJ6entqyZYs2bdqk1atXa+3atVq8eLHat2+v9evXy9XV9brHufz/HovK1X6Q5+Tk2OqpKFztOOaKxcvFKS/c/uUvf1FkZGSBNY0bN3Z6vm7dOkm//Z/+gQMHrB9a1+Pv76/k5GStW7dOa9as0Zo1azRv3jz16dPHaZH7zSrsec/NzVWjRo309ttvF7g9ODjY6fl///tfKxTs3r3b6crg9Y4jSQsWLFBgYGC+7Vd+OtXd3T1fgM5jZ66DBw/WvHnzNGTIEIWFhcnb21sOh0M9e/Ys9NXhPLm5ufrDH/6g4cOHF7g9L3Da+dq3bt1ahw4d0sqVK7V+/Xp98MEHmjx5smbPnq0BAwZIkp588klt3bpVw4YNU9OmTVWhQgXl5uaqU6dO+ebSp08fvfDCC/rpp5+UlZWlbdu2acaMGU41Q4YMUbdu3bRixQqtW7dOr732msaPH6+NGzeqWbNmN3VuUDiEJpRaCxYskPTbp4KuxcXFRR06dFCHDh309ttv6x//+IdeeeUVbdq0SeHh4UV+B/ErL98bY3Tw4EGnH+6VKlVSWlpavtceOXJEtWrVsp7fSG/Vq1fX559/rrNnzzpd9di3b5+1vShUr15du3btUm5urtMPy5s9TkFz9fPzU8WKFZWTk2NdHbiWXbt2aezYserbt6+Sk5M1YMAA7d69W97e3tc8Th43Nzd169ZN3bp1U25urp577jm9++67eu21164azm/Xea9du7a+/fZbdejQ4brfF5mZmerbt6/q16+vBx98UBMnTtSf/vQn69OY0tXPQ97baf7+/rbO+c1atmyZIiMjNWnSJGvswoULBf79uFG1a9fWuXPnbM3Dztfe19dXffv2Vd++fXXu3Dm1bt1ao0eP1oABA3TmzBklJCRozJgxiouLs/Z7tbfzevbsqdjYWP3zn//U+fPnVbZsWT311FMFzuHFF1/Uiy++qAMHDqhp06aaNGmS06eGcfvw9hxKpY0bN2rcuHGqWbOm9RHngpw+fTrfWNOmTSXJ+ph23j1liuIfaUn68MMPndZZLVu2TCdOnFDnzp2tsdq1a2vbtm26ePGiNbZq1ap8tya4kd66dOminJycfP+3OnnyZDkcDqfj34wuXbooJSVFixcvtsYuXbqk6dOnq0KFCmrTpk2h9lu+fPl883R1dVWPHj30ySefaM+ePflec+rUKevP2dnZioqKUlBQkKZOnar4+HilpqZq6NCh+Y4j5T+nl38sXPotbOcF3cs/0n+l23Xen3zySf3f//2f3n///Xzbzp8/r8zMTOv5iBEjdPToUc2fP19vv/22atSoocjISKd5XO08REREyMvLS//4xz+UnZ2d71iXn/Oi4Orqmu8q2/Tp05WTk3PT+37yySeVlJRkXX28XFpami5duiTJ3tf+ypoKFSqoTp061va8q2pXzmXKlCkF9la5cmV17txZH330kRYuXKhOnTqpcuXK1vZff/1VFy5ccHpN7dq1VbFixWt+P+LW4koTSrw1a9Zo3759unTpklJTU7Vx40Zt2LBB1atX16effioPD4+rvnbs2LHasmWLunbtqurVq+vkyZN65513VLVqVT388MOSfvuHyMfHR7Nnz1bFihVVvnx5hYaG2n5b50q+vr56+OGH1bdvX6WmpmrKlCmqU6eOBg4caNUMGDBAy5YtU6dOnfTkk0/q0KFD+uijj5wWzd5ob926dVO7du30yiuv6Mcff1STJk20fv16rVy5UkOGDMm378J65pln9O677yoqKko7duxQjRo1tGzZMn355ZeaMmXKNdeYXUtISIg+//xzvf322woKClLNmjUVGhqqN954Q5s2bVJoaKgGDhyo+vXr6/Tp09q5c6c+//xzKxi//vrrSk5OVkJCgipWrKjGjRsrLi5Or776qh5//HF16dLFOo4kPf/884qIiJCrq6t69uypAQMG6PTp02rfvr2qVq2qI0eOaPr06WratKm1Pqkgt+u8P/3001qyZIkGDRqkTZs26aGHHlJOTo727dunJUuWaN26dWrevLk2btyod955R6NGjdL9998vSZo3b57atm2r1157TRMnTpT02/88uLq6asKECUpPT5e7u7vat28vf39/zZo1S08//bTuv/9+9ezZU35+fjp69KhWr16thx56KF9AvBmPPPKIFixYIG9vb9WvX19JSUn6/PPPddddd930vocNG6ZPP/1UjzzyiKKiohQSEqLMzEzt3r1by5Yt048//qjKlSvb+trXr19fbdu2VUhIiHx9fbV9+3YtW7ZMMTExkiQvLy9r3VF2drbuvvturV+/XocPH75qf3369NHjjz8uSRo3bpzTth9++EEdOnTQk08+qfr166tMmTJavny5UlNT1bNnz5s+Nyik4vrYHnA9ebccyHu4ubmZwMBA84c//MFMnTrV6aPtea685UBCQoL54x//aIKCgoybm5sJCgoyvXr1yvcR5JUrV5r69eubMmXKOH3Ev02bNqZBgwYF9ne1Ww7885//NCNHjjT+/v7G09PTdO3a1Rw5ciTf6ydNmmTuvvtu4+7ubh566CGzffv2fPu8Vm9X3nLAmN8+Lj506FATFBRkypYta+655x7z5ptvWveIySPJREdH5+vpardCuFJqaqrp27evqVy5snFzczONGjUq8LYIN3LLgX379pnWrVsbT09PI8mpj9TUVBMdHW2Cg4NN2bJlTWBgoOnQoYN57733jDHG7Nixw5QpU8YMHjzYaZ+XLl0yLVq0MEFBQebMmTPW2ODBg42fn59xOBzW98uyZctMx44djb+/v3FzczPVqlUzf/3rX82JEyeu2/vtOu8XL140EyZMMA0aNDDu7u6mUqVKJiQkxIwZM8akp6ebjIwMU716dXP//feb7Oxsp9cOHTrUuLi4mKSkJGvs/fffN7Vq1TKurq75Ps6/adMmExERYby9vY2Hh4epXbu2iYqKMtu3b7dqIiMjTfny5Qvs1e5cz5w5Y30vVahQwURERJh9+/blqyvMLQeM+e1rM3LkSFOnTh3j5uZmKleubB588EHz1ltvmYsXLxpj7H3tX3/9dfPAAw8YHx8f4+npaerWrWv+/ve/W/swxpiffvrJ/OlPfzI+Pj7G29vbPPHEE+b48eNGknUrgctlZWWZSpUqGW9vb3P+/HmnbT///LOJjo42devWNeXLlzfe3t4mNDTULFmy5Ibmj6LlMKYErfoEAOAOcenSJQUFBalbt26aM2dOcbcDG1jTBABAMVixYoVOnTqlPn36FHcrsIkrTQCAUuX8+fPXvYeTr69vsf9S7av56quvtGvXLo0bN06VK1e+6Zt44vZhITgAoFRZvHjxde/Sv2nTpgJ/oXZJMGvWLH300Udq2rSp9Qu4UTpwpQkAUKqcOHFCe/fuvWZNSEiIKlWqdJs6wp2C0AQAAGADC8EBAABsYE1TEcnNzdXx48dVsWLFIv+1HAAA4NYwxujs2bMKCgq66u9RzENoKiLHjx/P9wszAQBA6XDs2DFVrVr1mjWEpiKS96sjjh07Ji8vr2LuBgAA2JGRkaHg4GBbvwKK0FRE8t6S8/LyIjQBAFDK2Flaw0JwAAAAGwhNAAAANhCacEcYP368WrRooYoVK8rf31/du3fX/v37re0//vijHA5HgY+lS5dKkn755Rd16tRJQUFBcnd3V3BwsGJiYpSRkWHtJyoqqsB9NGjQwKrZsmWLunXrpqCgIDkcDq1YseK2nQcAQOERmnBHSExMVHR0tLZt26YNGzYoOztbHTt2VGZmpiQpODhYJ06ccHqMGTNGFSpUUOfOnSVJLi4u+uMf/6hPP/1UP/zwg+Lj4/X5559r0KBB1nGmTp3qtI9jx47J19dXTzzxhFWTmZmpJk2aaObMmbf3JAAAbgp3BC8iGRkZ8vb2Vnp6OgvBS4FTp07J399fiYmJat26dYE1zZo10/333685c+ZcdT/Tpk3Tm2++qWPHjhW4fcWKFXrsscd0+PBhVa9ePd92h8Oh5cuXq3v37oWaBwDg5tzIz2+uNOGOlPcb0n19fQvcvmPHDiUnJ6t///5X3cfx48f1r3/9S23atLlqzZw5cxQeHl5gYAIAlC6EJtxxcnNzNWTIED300ENq2LBhgTVz5sxRvXr19OCDD+bb1qtXL5UrV0533323vLy89MEHHxS4j+PHj2vNmjUaMGBAkfYPACgehCbccaKjo7Vnzx59/PHHBW4/f/68Fi1adNWrTJMnT9bOnTu1cuVKHTp0SLGxsQXWzZ8/Xz4+Prz1BgC/E9zcEneUmJgYrVq1Slu2bLnq7fKXLVumX3/9VX369Clwe2BgoAIDA1W3bl35+vqqVatWeu2111SlShWrxhijuXPn6umnn5abm9stmQsA4PYiNOGOYIzR4MGDtXz5cm3evFk1a9a8au2cOXP06KOPys/P77r7zc3NlSRlZWU5jScmJurgwYPXXBMFAChdCE24I0RHR2vRokVauXKlKlasqJSUFEmSt7e3PD09rbqDBw9qy5Yt+uyzz/Lt47PPPlNqaqpatGihChUqaO/evRo2bJgeeugh1ahRw6l2zpw5Cg0NLXDN1Llz53Tw4EHr+eHDh5WcnCxfX19Vq1atiGYMAChq3HKgiHDLgZLtar9TaN68eYqKirKev/zyy/roo4/0448/ysXFecnfpk2b9Morr+i7775TVlaWgoOD9dhjj+mll16Sj4+PVZeenq4qVapo6tSpGjhwYL5jbt68We3atcs3HhkZqfj4+ELNDwBQODfy85vQVEQITQAAlD7cpwkAAKCIEZoAAABsYCF4KRMy7MPibgEocXa8WfDtIQCgKHGlCQAAwAZCEwAAgA2EJgAAABsITQAAADYQmgAAAGwgNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQBAAAYAOhCQAAwIZiDU3jx49XixYtVLFiRfn7+6t79+7av3+/U03btm3lcDicHoMGDXKqOXr0qLp27apy5crJ399fw4YN06VLl5xqNm/erPvvv1/u7u6qU6eO4uPj8/Uzc+ZM1ahRQx4eHgoNDdXXX39d5HMGAAClU7GGpsTEREVHR2vbtm3asGGDsrOz1bFjR2VmZjrVDRw4UCdOnLAeEydOtLbl5OSoa9euunjxorZu3ar58+crPj5ecXFxVs3hw4fVtWtXtWvXTsnJyRoyZIgGDBigdevWWTWLFy9WbGysRo0apZ07d6pJkyaKiIjQyZMnb/2JAAAAJZ7DGGOKu4k8p06dkr+/vxITE9W6dWtJv11patq0qaZMmVLga9asWaNHHnlEx48fV0BAgCRp9uzZGjFihE6dOiU3NzeNGDFCq1ev1p49e6zX9ezZU2lpaVq7dq0kKTQ0VC1atNCMGTMkSbm5uQoODtbgwYP10ksvXbf3jIwMeXt7Kz09XV5eXjdzGq4pZNiHt2zfQGm1480+xd0CgFLqRn5+l6g1Tenp6ZIkX19fp/GFCxeqcuXKatiwoUaOHKlff/3V2paUlKRGjRpZgUmSIiIilJGRob1791o14eHhTvuMiIhQUlKSJOnixYvasWOHU42Li4vCw8OtmitlZWUpIyPD6QEAAH6/yhR3A3lyc3M1ZMgQPfTQQ2rYsKE1/uc//1nVq1dXUFCQdu3apREjRmj//v3617/+JUlKSUlxCkySrOcpKSnXrMnIyND58+d15swZ5eTkFFizb9++AvsdP368xowZc3OTBgAApUaJCU3R0dHas2ePvvjiC6fxZ555xvpzo0aNVKVKFXXo0EGHDh1S7dq1b3eblpEjRyo2NtZ6npGRoeDg4GLrBwAA3FolIjTFxMRo1apV2rJli6pWrXrN2tDQUEnSwYMHVbt2bQUGBub7lFtqaqokKTAw0Ppv3tjlNV5eXvL09JSrq6tcXV0LrMnbx5Xc3d3l7u5uf5IAAKBUK9Y1TcYYxcTEaPny5dq4caNq1qx53dckJydLkqpUqSJJCgsL0+7du50+5bZhwwZ5eXmpfv36Vk1CQoLTfjZs2KCwsDBJkpubm0JCQpxqcnNzlZCQYNUAAIA7W7FeaYqOjtaiRYu0cuVKVaxY0VqD5O3tLU9PTx06dEiLFi1Sly5ddNddd2nXrl0aOnSoWrdurcaNG0uSOnbsqPr16+vpp5/WxIkTlZKSoldffVXR0dHWlaBBgwZpxowZGj58uPr166eNGzdqyZIlWr16tdVLbGysIiMj1bx5cz3wwAOaMmWKMjMz1bdv39t/YgAAQIlTrKFp1qxZkn67rcDl5s2bp6ioKLm5uenzzz+3AkxwcLB69OihV1991ap1dXXVqlWr9OyzzyosLEzly5dXZGSkxo4da9XUrFlTq1ev1tChQzV16lRVrVpVH3zwgSIiIqyap556SqdOnVJcXJxSUlLUtGlTrV27Nt/icAAAcGcqUfdpKs24TxNQfLhPE4DCKrX3aQIAACipCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQBAAAYAOhCQAAwAZCEwAAgA2EJgAAABsITQAAADYQmgAAAGwgNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQBAAAYAOhCQAAwAZCEwAAgA2EJgAAABsITQAAADYQmgAAAGwgNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwoVhD0/jx49WiRQtVrFhR/v7+6t69u/bv3+9Uc+HCBUVHR+uuu+5ShQoV1KNHD6WmpjrVHD16VF27dlW5cuXk7++vYcOG6dKlS041mzdv1v333y93d3fVqVNH8fHx+fqZOXOmatSoIQ8PD4WGhurrr78u8jkDAIDSqVhDU2JioqKjo7Vt2zZt2LBB2dnZ6tixozIzM62aoUOH6t///reWLl2qxMREHT9+XI899pi1PScnR127dtXFixe1detWzZ8/X/Hx8YqLi7NqDh8+rK5du6pdu3ZKTk7WkCFDNGDAAK1bt86qWbx4sWJjYzVq1Cjt3LlTTZo0UUREhE6ePHl7TgYAACjRHMYYU9xN5Dl16pT8/f2VmJio1q1bKz09XX5+flq0aJEef/xxSdK+fftUr149JSUlqWXLllqzZo0eeeQRHT9+XAEBAZKk2bNna8SIETp16pTc3Nw0YsQIrV69Wnv27LGO1bNnT6WlpWnt2rWSpNDQULVo0UIzZsyQJOXm5io4OFiDBw/WSy+9dN3eMzIy5O3trfT0dHl5eRX1qbGEDPvwlu0bKK12vNmnuFsAUErdyM/vErWmKT09XZLk6+srSdqxY4eys7MVHh5u1dStW1fVqlVTUlKSJCkpKUmNGjWyApMkRUREKCMjQ3v37rVqLt9HXk3ePi5evKgdO3Y41bi4uCg8PNyqAQAAd7Yyxd1AntzcXA0ZMkQPPfSQGjZsKElKSUmRm5ubfHx8nGoDAgKUkpJi1VwemPK25227Vk1GRobOnz+vM2fOKCcnp8Caffv2FdhvVlaWsrKyrOcZGRk3OGMAAFCalJgrTdHR0dqzZ48+/vjj4m7FlvHjx8vb29t6BAcHF3dLAADgFioRoSkmJkarVq3Spk2bVLVqVWs8MDBQFy9eVFpamlN9amqqAgMDrZorP02X9/x6NV5eXvL09FTlypXl6upaYE3ePq40cuRIpaenW49jx47d+MQBAECpUayhyRijmJgYLV++XBs3blTNmjWdtoeEhKhs2bJKSEiwxvbv36+jR48qLCxMkhQWFqbdu3c7fcptw4YN8vLyUv369a2ay/eRV5O3Dzc3N4WEhDjV5ObmKiEhwaq5kru7u7y8vJweAADg96tY1zRFR0dr0aJFWrlypSpWrGitQfL29panp6e8vb3Vv39/xcbGytfXV15eXho8eLDCwsLUsmVLSVLHjh1Vv359Pf3005o4caJSUlL06quvKjo6Wu7u7pKkQYMGacaMGRo+fLj69eunjRs3asmSJVq9erXVS2xsrCIjI9W8eXM98MADmjJlijIzM9W3b9/bf2IAAECJU6yhadasWZKktm3bOo3PmzdPUVFRkqTJkyfLxcVFPXr0UFZWliIiIvTOO+9Yta6urlq1apWeffZZhYWFqXz58oqMjNTYsWOtmpo1a2r16tUaOnSopk6dqqpVq+qDDz5QRESEVfPUU0/p1KlTiouLU0pKipo2baq1a9fmWxwOAADuTCXqPk2lGfdpAooP92kCUFil9j5NAAAAJRWhCQAAwAZCEwAAgA2EJgAAABsITQAAADYQmgAAAGwgNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAKXeli1b1K1bNwUFBcnhcGjFihVO2x0OR4GPN998M9++srKy1LRpUzkcDiUnJzttW7JkiZo2bapy5cqpevXq+V4fFRVV4HEaNGhQ1FNGMSA0AQBKvczMTDVp0kQzZ84scPuJEyecHnPnzpXD4VCPHj3y1Q4fPlxBQUH5xtesWaPevXtr0KBB2rNnj9555x1NnjxZM2bMsGqmTp3qdJxjx47J19dXTzzxRNFNFsWmTHE3AADAzercubM6d+581e2BgYFOz1euXKl27dqpVq1aTuNr1qzR+vXr9cknn2jNmjVO2xYsWKDu3btr0KBBkqRatWpp5MiRmjBhgqKjo+VwOOTt7S1vb2/rNStWrNCZM2fUt2/fm50iSgCuNAEA7iipqalavXq1+vfvn2984MCBWrBggcqVK5fvdVlZWfLw8HAa8/T01E8//aQjR44UeKw5c+YoPDxc1atXL7oJoNgQmgAAd5T58+erYsWKeuyxx6wxY4yioqI0aNAgNW/evMDXRURE6F//+pcSEhKUm5urH374QZMmTZL029t/Vzp+/LjWrFmjAQMG3JqJ4LYjNAEA7ihz585V7969na4aTZ8+XWfPntXIkSOv+rqBAwcqJiZGjzzyiNzc3NSyZUv17NlTkuTikv/H6fz58+Xj46Pu3bsX+RxQPAhNAIA7xn/+8x/t378/39WfjRs3KikpSe7u7ipTpozq1KkjSWrevLkiIyMl/fYJvAkTJujcuXM6cuSIUlJS9MADD0hSvrVRxhjNnTtXTz/9tNzc3G7DzHA7sBAcAHDHmDNnjkJCQtSkSROn8WnTpun111+3nh8/flwRERFavHixQkNDnWpdXV119913S5L++c9/KiwsTH5+fk41iYmJOnjwYL51UyjdCE0AgFLv3LlzOnjwoPX88OHDSk5Olq+vr6pVqyZJysjI0NKlS611SJfLq8lToUIFSVLt2rVVtWpVSdLPP/+sZcuWqW3btrpw4YLmzZunpUuXKjExMd/+5syZo9DQUDVs2LDI5ojix9tzAIBSb/v27WrWrJmaNWsmSYqNjVWzZs0UFxdn1Xz88ccyxqhXr16FPs78+fPVvHlzPfTQQ9q7d682b95svUWXJz09XZ988glXmX6HHMYYU9xN/B5kZGTI29tb6enp8vLyumXHCRn24S3bN1Ba7XizT3G3AKCUupGf31xpAgAAsIHQBAAAYAMLwQGghDg6tlFxtwCUONXidhd3CxauNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQBAAAYAOhCQAAwAZCEwAAgA2EJgAAABsKFZrat2+vtLS0fOMZGRlq3779zfYEAABQ4hQqNG3evFkXL17MN37hwgX95z//uemmAAAASpoyN1K8a9cu68/fffedUlJSrOc5OTlau3at7r777qLrDgAAoIS4odDUtGlTORwOORyOAt+G8/T01PTp04usOQAAgJLihkLT4cOHZYxRrVq19PXXX8vPz8/a5ubmJn9/f7m6uhZ5kwAAAMXthtY0Va9eXTVq1FBubq6aN2+u6tWrW48qVarccGDasmWLunXrpqCgIDkcDq1YscJpe1RUlHVlK+/RqVMnp5rTp0+rd+/e8vLyko+Pj/r3769z58451ezatUutWrWSh4eHgoODNXHixHy9LF26VHXr1pWHh4caNWqkzz777IbmAgAAft9u6ErT5Q4cOKBNmzbp5MmTys3NddoWFxdnax+ZmZlq0qSJ+vXrp8cee6zAmk6dOmnevHnWc3d3d6ftvXv31okTJ7RhwwZlZ2erb9++euaZZ7Ro0SJJv32ir2PHjgoPD9fs2bO1e/du9evXTz4+PnrmmWckSVu3blWvXr00fvx4PfLII1q0aJG6d++unTt3qmHDhrbPCQAA+P0qVGh6//339eyzz6py5coKDAyUw+GwtjkcDtuhqXPnzurcufM1a9zd3RUYGFjgtu+//15r167VN998o+bNm0uSpk+fri5duuitt95SUFCQFi5cqIsXL2ru3Llyc3NTgwYNlJycrLffftsKTVOnTlWnTp00bNgwSdK4ceO0YcMGzZgxQ7Nnz7Y1FwAA8PtWqFsOvP766/r73/+ulJQUJScn67///a/12LlzZ5E2uHnzZvn7++u+++7Ts88+q19++cXalpSUJB8fHyswSVJ4eLhcXFz01VdfWTWtW7eWm5ubVRMREaH9+/frzJkzVk14eLjTcSMiIpSUlHTVvrKyspSRkeH0AAAAv1+FCk1nzpzRE088UdS95NOpUyd9+OGHSkhI0IQJE5SYmKjOnTsrJydHkpSSkiJ/f3+n15QpU0a+vr7W7RBSUlIUEBDgVJP3/Ho1l99S4Urjx4+Xt7e39QgODr65yQIAgBKtUKHpiSee0Pr164u6l3x69uypRx99VI0aNVL37t21atUqffPNN9q8efMtP/b1jBw5Uunp6dbj2LFjxd0SAAC4hQq1pqlOnTp67bXXtG3bNjVq1Ehly5Z12v78888XSXNXqlWrlipXrqyDBw+qQ4cOCgwM1MmTJ51qLl26pNOnT1vroAIDA5WamupUk/f8ejVXW0sl/bbW6spF6QAA4PerUKHpvffeU4UKFZSYmKjExESnbQ6H45aFpp9++km//PKLqlSpIkkKCwtTWlqaduzYoZCQEEnSxo0blZubq9DQUKvmlVdeUXZ2thXuNmzYoPvuu0+VKlWyahISEjRkyBDrWBs2bFBYWNgtmQcAACh9ChWaDh8+XCQHP3funA4ePOi03+TkZPn6+srX11djxoxRjx49FBgYqEOHDmn48OGqU6eOIiIiJEn16tVTp06dNHDgQM2ePVvZ2dmKiYlRz549FRQUJEn685//rDFjxqh///4aMWKE9uzZo6lTp2ry5MnWcV944QW1adNGkyZNUteuXfXxxx9r+/bteu+994pkngAAoPQr1JqmorJ9+3Y1a9ZMzZo1kyTFxsaqWbNmiouLk6urq3bt2qVHH31U9957r/r376+QkBD95z//cXpbbOHChapbt646dOigLl266OGHH3YKO97e3lq/fr0OHz6skJAQvfjii4qLi7NuNyBJDz74oBYtWqT33ntPTZo00bJly7RixQru0QQAACwOY4y50Rf169fvmtvnzp1b6IZKq4yMDHl7eys9PV1eXl637Dghwz68ZfsGSqsdb/Yp7haKxNGxjYq7BaDEqRa3+5bu/0Z+fhfq7bm8+xvlyc7O1p49e5SWllbgL/IFAAAo7QoVmpYvX55vLDc3V88++6xq1659000BAACUNEW2psnFxUWxsbFOC6wBAAB+L4p0IfihQ4d06dKlotwlAABAiVCot+diY2OdnhtjdOLECa1evVqRkZFF0hgAAEBJUqjQ9N///tfpuYuLi/z8/DRp0qTrfrIOAACgNCpUaNq0aVNR9wEAAFCiFSo05Tl16pT2798vSbrvvvvk5+dXJE0BAACUNIVaCJ6Zmal+/fqpSpUqat26tVq3bq2goCD1799fv/76a1H3CAAAUOwKFZpiY2OVmJiof//730pLS1NaWppWrlypxMREvfjii0XdIwAAQLEr1Ntzn3zyiZYtW6a2bdtaY126dJGnp6eefPJJzZo1q6j6AwAAKBEKdaXp119/VUBAQL5xf39/3p4DAAC/S4UKTWFhYRo1apQuXLhgjZ0/f15jxoxRWFhYkTUHAABQUhTq7bkpU6aoU6dOqlq1qpo0aSJJ+vbbb+Xu7q7169cXaYMAAAAlQaFCU6NGjXTgwAEtXLhQ+/btkyT16tVLvXv3lqenZ5E2CAAAUBIUKjSNHz9eAQEBGjhwoNP43LlzderUKY0YMaJImgMAACgpCrWm6d1331XdunXzjTdo0ECzZ8++6aYAAABKmkKFppSUFFWpUiXfuJ+fn06cOHHTTQEAAJQ0hQpNwcHB+vLLL/ONf/nllwoKCrrppgAAAEqaQq1pGjhwoIYMGaLs7Gy1b99ekpSQkKDhw4dzR3AAAPC7VKjQNGzYMP3yyy967rnndPHiRUmSh4eHRowYoZEjRxZpgwAAACVBoUKTw+HQhAkT9Nprr+n777+Xp6en7rnnHrm7uxd1fwAAACVCoUJTngoVKqhFixZF1QsAAECJVaiF4AAAAHcaQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQBAAAYAOhCQAAwAZCEwAAgA2EJgAAABsITQAAADYQmgAAAGwgNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYUKyhacuWLerWrZuCgoLkcDi0YsUKp+3GGMXFxalKlSry9PRUeHi4Dhw44FRz+vRp9e7dW15eXvLx8VH//v117tw5p5pdu3apVatW8vDwUHBwsCZOnJivl6VLl6pu3bry8PBQo0aN9NlnnxX5fAEAQOlVrKEpMzNTTZo00cyZMwvcPnHiRE2bNk2zZ8/WV199pfLlyysiIkIXLlywanr37q29e/dqw4YNWrVqlbZs2aJnnnnG2p6RkaGOHTuqevXq2rFjh958802NHj1a7733nlWzdetW9erVS/3799d///tfde/eXd27d9eePXtu3eQBAECp4jDGmOJuQpIcDoeWL1+u7t27S/rtKlNQUJBefPFF/e1vf5MkpaenKyAgQPHx8erZs6e+//571a9fX998842aN28uSVq7dq26dOmin376SUFBQZo1a5ZeeeUVpaSkyM3NTZL00ksvacWKFdq3b58k6amnnlJmZqZWrVpl9dOyZUs1bdpUs2fPttV/RkaGvL29lZ6eLi8vr6I6LfmEDPvwlu0bKK12vNmnuFsoEkfHNiruFoASp1rc7lu6/xv5+V1i1zQdPnxYKSkpCg8Pt8a8vb0VGhqqpKQkSVJSUpJ8fHyswCRJ4eHhcnFx0VdffWXVtG7d2gpMkhQREaH9+/frzJkzVs3lx8mryTtOQbKyspSRkeH0AAAAv18lNjSlpKRIkgICApzGAwICrG0pKSny9/d32l6mTBn5+vo61RS0j8uPcbWavO0FGT9+vLy9va1HcHDwjU4RAACUIiU2NJV0I0eOVHp6uvU4duxYcbcEAABuoRIbmgIDAyVJqampTuOpqanWtsDAQJ08edJp+6VLl3T69GmnmoL2cfkxrlaTt70g7u7u8vLycnoAAIDfrxIbmmrWrKnAwEAlJCRYYxkZGfrqq68UFhYmSQoLC1NaWpp27Nhh1WzcuFG5ubkKDQ21arZs2aLs7GyrZsOGDbrvvvtUqVIlq+by4+TV5B0HAACgWEPTuXPnlJycrOTkZEm/Lf5OTk7W0aNH5XA4NGTIEL3++uv69NNPtXv3bvXp00dBQUHWJ+zq1aunTp06aeDAgfr666/15ZdfKiYmRj179lRQUJAk6c9//rPc3NzUv39/7d27V4sXL9bUqVMVGxtr9fHCCy9o7dq1mjRpkvbt26fRo0dr+/btiomJud2nBAAAlFBlivPg27dvV7t27azneUEmMjJS8fHxGj58uDIzM/XMM88oLS1NDz/8sNauXSsPDw/rNQsXLlRMTIw6dOggFxcX9ejRQ9OmTbO2e3t7a/369YqOjlZISIgqV66suLg4p3s5Pfjgg1q0aJFeffVVvfzyy7rnnnu0YsUKNWzY8DacBQAAUBqUmPs0lXbcpwkoPtynCfj94j5NAAAApQyhCQAAwAZCEwAAgA2EJgAAABsITQAAADYQmgAAAGwgNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQBAAAYAOhCQAAwAZCEwAAgA2EJgAAABsITQAAADYQmgAAAGwgNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQBAAAYAOhCQAAwAZCEwAAgA2EJgAAABsITQAAADaU6NA0evRoORwOp0fdunWt7RcuXFB0dLTuuusuVahQQT169FBqaqrTPo4ePaquXbuqXLly8vf317Bhw3Tp0iWnms2bN+v++++Xu7u76tSpo/j4+NsxPQAAUIqU6NAkSQ0aNNCJEyesxxdffGFtGzp0qP79739r6dKlSkxM1PHjx/XYY49Z23NyctS1a1ddvHhRW7du1fz58xUfH6+4uDir5vDhw+ratavatWun5ORkDRkyRAMGDNC6detu6zwBAEDJVqa4G7ieMmXKKDAwMN94enq65syZo0WLFql9+/aSpHnz5qlevXratm2bWrZsqfXr1+u7777T559/roCAADVt2lTjxo3TiBEjNHr0aLm5uWn27NmqWbOmJk2aJEmqV6+evvjiC02ePFkRERG3da4AAKDkKvFXmg4cOKCgoCDVqlVLvXv31tGjRyVJO3bsUHZ2tsLDw63aunXrqlq1akpKSpIkJSUlqVGjRgoICLBqIiIilJGRob1791o1l+8jryZvH1eTlZWljIwMpwcAAPj9KtGhKTQ0VPHx8Vq7dq1mzZqlw4cPq1WrVjp79qxSUlLk5uYmHx8fp9cEBAQoJSVFkpSSkuIUmPK25227Vk1GRobOnz9/1d7Gjx8vb29v6xEcHHyz0wUAACVYiX57rnPnztafGzdurNDQUFWvXl1LliyRp6dnMXYmjRw5UrGxsdbzjIwMghMAAL9jJfpK05V8fHx077336uDBgwoMDNTFixeVlpbmVJOammqtgQoMDMz3abq859er8fLyumYwc3d3l5eXl9MDAAD8fpWq0HTu3DkdOnRIVapUUUhIiMqWLauEhARr+/79+3X06FGFhYVJksLCwrR7926dPHnSqtmwYYO8vLxUv359q+byfeTV5O0DAABAKuGh6W9/+5sSExP1448/auvWrfrTn/4kV1dX9erVS97e3urfv79iY2O1adMm7dixQ3379lVYWJhatmwpSerYsaPq16+vp59+Wt9++63WrVunV199VdHR0XJ3d5ckDRo0SP/73/80fPhw7du3T++8846WLFmioUOHFufUAQBACVOi1zT99NNP6tWrl3755Rf5+fnp4Ycf1rZt2+Tn5ydJmjx5slxcXNSjRw9lZWUpIiJC77zzjvV6V1dXrVq1Ss8++6zCwsJUvnx5RUZGauzYsVZNzZo1tXr1ag0dOlRTp05V1apV9cEHH3C7AQAA4MRhjDHF3cTvQUZGhry9vZWenn5L1zeFDPvwlu0bKK12vNmnuFsoEkfHNiruFoASp1rc7lu6/xv5+V2i354DAAAoKQhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQBAAAYAOhCQAAwAZCEwAAgA2EJgAAABsITQAAADYQmgAAAGwgNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQBAAAYAOhCQAAwAZCEwAAgA2EJgAAABsITQAAADYQmgAAAGwgNAEAANhAaAIAALCB0AQAAGADoQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANhCaAAAAbCA0AQAA2EBoAgAAsIHQdIWZM2eqRo0a8vDwUGhoqL7++uvibgkAAJQAhKbLLF68WLGxsRo1apR27typJk2aKCIiQidPnizu1gAAQDEjNF3m7bff1sCBA9W3b1/Vr19fs2fPVrly5TR37tzibg0AABQzQtP/c/HiRe3YsUPh4eHWmIuLi8LDw5WUlFSMnQEAgJKgTHE3UFL8/PPPysnJUUBAgNN4QECA9u3bl68+KytLWVlZ1vP09HRJUkZGxi3tMyfr/C3dP1Aa3eq/d7fL2Qs5xd0CUOLc6r/fefs3xly3ltBUSOPHj9eYMWPyjQcHBxdDN8CdzXv6oOJuAcCtMt77thzm7Nmz8va+9rEITf9P5cqV5erqqtTUVKfx1NRUBQYG5qsfOXKkYmNjree5ubk6ffq07rrrLjkcjlveL4pXRkaGgoODdezYMXl5eRV3OwCKEH+/7yzGGJ09e1ZBQUHXrSU0/T9ubm4KCQlRQkKCunfvLum3IJSQkKCYmJh89e7u7nJ3d3ca8/HxuQ2doiTx8vLiH1Xgd4q/33eO611hykNoukxsbKwiIyPVvHlzPfDAA5oyZYoyMzPVt2/f4m4NAAAUM0LTZZ566imdOnVKcXFxSklJUdOmTbV27dp8i8MBAMCdh9B0hZiYmALfjgMu5+7urlGjRuV7ixZA6cffb1yNw9j5jB0AAMAdjptbAgAA2EBoAgAAsIHQBAAAYAOhCQAAwAZCE1AIM2fOVI0aNeTh4aHQ0FB9/fXXxd0SgJu0ZcsWdevWTUFBQXI4HFqxYkVxt4QShtAE3KDFixcrNjZWo0aN0s6dO9WkSRNFRETo5MmTxd0agJuQmZmpJk2aaObMmcXdCkoobjkA3KDQ0FC1aNFCM2bMkPTbr9sJDg7W4MGD9dJLLxVzdwCKgsPh0PLly61fqwVIXGkCbsjFixe1Y8cOhYeHW2MuLi4KDw9XUlJSMXYGALjVCE3ADfj555+Vk5OT71frBAQEKCUlpZi6AgDcDoQmAAAAGwhNwA2oXLmyXF1dlZqa6jSempqqwMDAYuoKAHA7EJqAG+Dm5qaQkBAlJCRYY7m5uUpISFBYWFgxdgYAuNXKFHcDQGkTGxuryMhINW/eXA888ICmTJmizMxM9e3bt7hbA3ATzp07p4MHD1rPDx8+rOTkZPn6+qpatWrF2BlKCm45ABTCjBkz9OabbyolJUVNmzbVtGnTFBoaWtxtAbgJmzdvVrt27fKNR0ZGKj4+/vY3hBKH0AQAAGADa5oAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAO4Ybdu21ZAhQ2zVbt68WQ6HQ2lpaTd1zBo1amjKlCk3tQ8AJQOhCQAAwAZCEwAAgA2EJgB3pAULFqh58+aqWLGiAgMD9ec//1knT57MV/fll1+qcePG8vDwUMuWLbVnzx6n7V988YVatWolT09PBQcH6/nnn1dmZubtmgaA24jQBOCOlJ2drXHjxunbb7/VihUr9OOPPyoqKipf3bBhwzRp0iR988038vPzU7du3ZSdnS1JOnTokDp16qQePXpo165dWrx4sb744gvFxMTc5tkAuB3KFHcDAFAc+vXrZ/25Vq1amjZtmlq0aKFz586pQoUK1rZRo0bpD3/4gyRp/vz5qlq1qpYvX64nn3xS48ePV+/eva3F5ffcc4+mTZumNm3aaNasWfLw8LitcwJwa3GlCcAdaceOHerWrZuqVaumihUrqk2bNpKko0ePOtWFhYVZf/b19dV9992n77//XpL07bffKj4+XhUqVLAeERERys3N1eHDh2/fZADcFlxpAnDHyczMVEREhCIiIrRw4UL5+fnp6NGjioiI0MWLF23v59y5c/rrX/+q559/Pt+2atWqFWXLAEoAQhOAO86+ffv0yy+/6I033lBwcLAkafv27QXWbtu2zQpAZ86c0Q8//KB69epJku6//3599913qlOnzu1pHECx4u05AHecatWqyc3NTdOnT9f//vc/ffrppxo3blyBtWPHjlVCQoL27NmjqKgoVa5cWd27d5ckjRgxQlu3blVMTIySk5N14MABrVy5koXgwO8UoQnAHcfPz0/x8fFaunSp6tevrzfeeENvvfVWgbVvvPGGXnjhBYWEhCglJUX//ve/5ebmJklq3LixEhMT9cMPP6hVq1Zq1qyZ4uLiFBQUdDunA+A2cRhjTHE3AQAAUNJxpQkAAMAGQhMAAIANhCYAAAAbCE0AAAA2EJoAAABsIDQBAADYQGgCAACwgdAEAABgA6EJAADABkITAACADYQmAAAAGwhNAAAANvx/h77+XnKETk0AAAAASUVORK5CYII="},"metadata":{}}]},{"cell_type":"code","source":"external_essays = external_essays.rename(columns={'label': 'generated'})","metadata":{"datalore":{"node_id":"4IqXcSQ20yKEwIVt4aEkMU","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:05.018746Z","iopub.execute_input":"2024-02-07T16:13:05.019087Z","iopub.status.idle":"2024-02-07T16:13:05.026593Z","shell.execute_reply.started":"2024-02-07T16:13:05.019055Z","shell.execute_reply":"2024-02-07T16:13:05.025693Z"},"trusted":true},"execution_count":63,"outputs":[]},{"cell_type":"markdown","source":"**Concatenating the datasets**","metadata":{"datalore":{"node_id":"lNnLE86XV6G6LqNWNs5enI","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"df = pd.concat([external_essays[['text', 'generated']], train_essays[['text', 'generated']]], ignore_index=True)","metadata":{"datalore":{"node_id":"rAypYp2yNcT6NHZjqdot6E","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:05.027827Z","iopub.execute_input":"2024-02-07T16:13:05.028447Z","iopub.status.idle":"2024-02-07T16:13:05.041241Z","shell.execute_reply.started":"2024-02-07T16:13:05.028418Z","shell.execute_reply":"2024-02-07T16:13:05.040288Z"},"trusted":true},"execution_count":64,"outputs":[]},{"cell_type":"code","source":"# df = df.sample(frac=0.02, random_state=42)","metadata":{"execution":{"iopub.status.busy":"2024-02-07T16:13:05.042763Z","iopub.execute_input":"2024-02-07T16:13:05.043040Z","iopub.status.idle":"2024-02-07T16:13:05.051498Z","shell.execute_reply.started":"2024-02-07T16:13:05.043016Z","shell.execute_reply":"2024-02-07T16:13:05.050544Z"},"trusted":true},"execution_count":65,"outputs":[]},{"cell_type":"code","source":"ax4 = sns.countplot(data=df, x='generated')\nax4.bar_label(ax4.containers[0])\nplt.title('Distribution of Label');","metadata":{"datalore":{"node_id":"zGhGuMuLpxQUftCSQaishZ","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:05.052688Z","iopub.execute_input":"2024-02-07T16:13:05.052952Z","iopub.status.idle":"2024-02-07T16:13:05.253363Z","shell.execute_reply.started":"2024-02-07T16:13:05.052929Z","shell.execute_reply":"2024-02-07T16:13:05.252425Z"},"trusted":true},"execution_count":66,"outputs":[{"output_type":"display_data","data":{"text/plain":"<Figure size 640x480 with 1 Axes>","image/png":""},"metadata":{}}]},{"cell_type":"code","source":"df.head()","metadata":{"datalore":{"node_id":"ItJikExfedJp3LE8X7Srha","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:05.254735Z","iopub.execute_input":"2024-02-07T16:13:05.255590Z","iopub.status.idle":"2024-02-07T16:13:05.265677Z","shell.execute_reply.started":"2024-02-07T16:13:05.255533Z","shell.execute_reply":"2024-02-07T16:13:05.264692Z"},"trusted":true},"execution_count":67,"outputs":[{"execution_count":67,"output_type":"execute_result","data":{"text/plain":"                                                text  generated\n0    In recent years, there has been a growing trend          1\n1   Dear Senator,\\n\\nI am writing in support of k...          1\n2   Car usage has long been a significant factor ...          1\n3   Limiting car usage is a concept that has gain...          1\n4  Passage 1:\\n\\nPassage 2:\\n\\nPassage 3:\\n\\nPass...          1","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>text</th>\n      <th>generated</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>In recent years, there has been a growing trend</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Dear Senator,\\n\\nI am writing in support of k...</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Car usage has long been a significant factor ...</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Limiting car usage is a concept that has gain...</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Passage 1:\\n\\nPassage 2:\\n\\nPassage 3:\\n\\nPass...</td>\n      <td>1</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"markdown","source":"**Clean text**","metadata":{"datalore":{"node_id":"8978uMqZOlfcnT3mHvQZp1","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"df.iloc[7][['text','generated']]","metadata":{"datalore":{"node_id":"3b3ir2r9QMkiwmxPnJ0E5l","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:05.267026Z","iopub.execute_input":"2024-02-07T16:13:05.267368Z","iopub.status.idle":"2024-02-07T16:13:05.279580Z","shell.execute_reply.started":"2024-02-07T16:13:05.267338Z","shell.execute_reply":"2024-02-07T16:13:05.278553Z"},"trusted":true},"execution_count":68,"outputs":[{"execution_count":68,"output_type":"execute_result","data":{"text/plain":"text         After researching extensively into different p...\ngenerated                                                    1\nName: 7, dtype: object"},"metadata":{}}]},{"cell_type":"code","source":"def clean_text(text):\n  #delete non-alphanumeric characters\n  text = re.sub(r\"[^A-Za-z0-9\\s]\", \"\", text)\n  #delete extra whitespaces\n  text = re.sub(r\"\\s+\", \" \", text)\n  #convert to lowercase\n  text = text.lower()\n\n  return text","metadata":{"datalore":{"node_id":"N1DjNMEJW5CsHvDCHngtmn","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:05.280640Z","iopub.execute_input":"2024-02-07T16:13:05.280937Z","iopub.status.idle":"2024-02-07T16:13:05.289950Z","shell.execute_reply.started":"2024-02-07T16:13:05.280914Z","shell.execute_reply":"2024-02-07T16:13:05.289146Z"},"trusted":true},"execution_count":69,"outputs":[]},{"cell_type":"code","source":"df['text'] = df['text'].map(clean_text)","metadata":{"datalore":{"node_id":"qjp0S2zUgdmaJANVSOwwXv","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:05.291016Z","iopub.execute_input":"2024-02-07T16:13:05.291337Z","iopub.status.idle":"2024-02-07T16:13:16.672977Z","shell.execute_reply.started":"2024-02-07T16:13:05.291313Z","shell.execute_reply":"2024-02-07T16:13:16.672155Z"},"trusted":true},"execution_count":70,"outputs":[]},{"cell_type":"code","source":"df.iloc[7]['text']","metadata":{"execution":{"iopub.status.busy":"2024-02-07T16:13:16.678991Z","iopub.execute_input":"2024-02-07T16:13:16.679353Z","iopub.status.idle":"2024-02-07T16:13:16.685339Z","shell.execute_reply.started":"2024-02-07T16:13:16.679328Z","shell.execute_reply":"2024-02-07T16:13:16.684457Z"},"trusted":true},"execution_count":71,"outputs":[{"execution_count":71,"output_type":"execute_result","data":{"text/plain":"'after researching extensively into different potential career paths i have shortlisted my top 5 options i have read up on the various qualifications responsibilities and other important aspects of each one and weighed them up against my skills and experience now i am planning to discuss my top 5 with my parents and teachers to get their opinions on which one could potentially benefit me the best'"},"metadata":{}}]},{"cell_type":"markdown","source":"## 4. Tokenizing","metadata":{"datalore":{"node_id":"2Tt8LbIyZKzdQgMtl7VNhK","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"# checkpoint = '../input/transformers/distilbert-base-uncased'\ncheckpoint = 'distilbert/distilbert-base-uncased-finetuned-sst-2-english'","metadata":{"datalore":{"node_id":"SrkC7QpLgPn1I0cIx9UaA0","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:16.686480Z","iopub.execute_input":"2024-02-07T16:13:16.686807Z","iopub.status.idle":"2024-02-07T16:13:16.698985Z","shell.execute_reply.started":"2024-02-07T16:13:16.686776Z","shell.execute_reply":"2024-02-07T16:13:16.698299Z"},"trusted":true},"execution_count":72,"outputs":[]},{"cell_type":"code","source":"batch_size = 25","metadata":{"execution":{"iopub.status.busy":"2024-02-07T16:13:16.700077Z","iopub.execute_input":"2024-02-07T16:13:16.700702Z","iopub.status.idle":"2024-02-07T16:13:16.711065Z","shell.execute_reply.started":"2024-02-07T16:13:16.700670Z","shell.execute_reply":"2024-02-07T16:13:16.710157Z"},"trusted":true},"execution_count":73,"outputs":[]},{"cell_type":"code","source":"tokenizer = AutoTokenizer.from_pretrained(checkpoint)","metadata":{"datalore":{"node_id":"2X59C8t23MfaAuhJcGqTAB","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:16.712051Z","iopub.execute_input":"2024-02-07T16:13:16.712349Z","iopub.status.idle":"2024-02-07T16:13:16.874897Z","shell.execute_reply.started":"2024-02-07T16:13:16.712325Z","shell.execute_reply":"2024-02-07T16:13:16.874159Z"},"trusted":true},"execution_count":74,"outputs":[]},{"cell_type":"code","source":"def tokenize_and_split(text):\n    encoding = tokenizer(\n        text[\"text\"],\n        truncation=True,\n        padding=True,\n        max_length=512,\n        return_overflowing_tokens=True,\n    )\n    # Extract mapping between new and old indices\n    sample_map = encoding.pop(\"overflow_to_sample_mapping\")\n    for key, values in text.items():\n        encoding[key] = [values[i] for i in sample_map]\n    return encoding","metadata":{"datalore":{"node_id":"Np01F6IxXptsTFbJEtQcVi","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:16.875924Z","iopub.execute_input":"2024-02-07T16:13:16.876226Z","iopub.status.idle":"2024-02-07T16:13:16.881595Z","shell.execute_reply.started":"2024-02-07T16:13:16.876202Z","shell.execute_reply":"2024-02-07T16:13:16.880686Z"},"trusted":true},"execution_count":75,"outputs":[]},{"cell_type":"code","source":"raw_ds = datasets.Dataset.from_pandas(df)\nraw_ds = raw_ds.train_test_split(test_size=0.2)\nraw_ds","metadata":{"datalore":{"node_id":"ok2iNWqKhS3Xr6BgnS8XKL","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:16.882783Z","iopub.execute_input":"2024-02-07T16:13:16.883139Z","iopub.status.idle":"2024-02-07T16:13:17.133386Z","shell.execute_reply.started":"2024-02-07T16:13:16.883084Z","shell.execute_reply":"2024-02-07T16:13:17.132340Z"},"trusted":true},"execution_count":76,"outputs":[{"execution_count":76,"output_type":"execute_result","data":{"text/plain":"DatasetDict({\n    train: Dataset({\n        features: ['text', 'generated'],\n        num_rows: 36996\n    })\n    test: Dataset({\n        features: ['text', 'generated'],\n        num_rows: 9250\n    })\n})"},"metadata":{}}]},{"cell_type":"code","source":"tokenized_ds = raw_ds.map(tokenize_and_split, batched=True)","metadata":{"datalore":{"node_id":"MDZBrC9YPCTvp51uU2TETh","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:17.134367Z","iopub.execute_input":"2024-02-07T16:13:17.136441Z","iopub.status.idle":"2024-02-07T16:13:57.609413Z","shell.execute_reply.started":"2024-02-07T16:13:17.136406Z","shell.execute_reply":"2024-02-07T16:13:57.608495Z"},"trusted":true},"execution_count":77,"outputs":[{"output_type":"display_data","data":{"text/plain":"Map:   0%|          | 0/36996 [00:00<?, ? examples/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"f76e65463f8941b9b07cb4edde985210"}},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"Map:   0%|          | 0/9250 [00:00<?, ? examples/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"1caf000576754e73b46a2b4837aad091"}},"metadata":{}}]},{"cell_type":"code","source":"tokenized_ds","metadata":{"datalore":{"node_id":"Kogb1E8iCGSyKM1a5tqNKr","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:57.610798Z","iopub.execute_input":"2024-02-07T16:13:57.611188Z","iopub.status.idle":"2024-02-07T16:13:57.619303Z","shell.execute_reply.started":"2024-02-07T16:13:57.611153Z","shell.execute_reply":"2024-02-07T16:13:57.618451Z"},"trusted":true},"execution_count":78,"outputs":[{"execution_count":78,"output_type":"execute_result","data":{"text/plain":"DatasetDict({\n    train: Dataset({\n        features: ['text', 'generated', 'input_ids', 'attention_mask'],\n        num_rows: 45367\n    })\n    test: Dataset({\n        features: ['text', 'generated', 'input_ids', 'attention_mask'],\n        num_rows: 11267\n    })\n})"},"metadata":{}}]},{"cell_type":"code","source":"data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors=\"tf\")","metadata":{"datalore":{"node_id":"ZiedoYlNkJO38MDwbBtk1D","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:57.620367Z","iopub.execute_input":"2024-02-07T16:13:57.620634Z","iopub.status.idle":"2024-02-07T16:13:57.634232Z","shell.execute_reply.started":"2024-02-07T16:13:57.620611Z","shell.execute_reply":"2024-02-07T16:13:57.633273Z"},"trusted":true},"execution_count":79,"outputs":[]},{"cell_type":"markdown","source":"## 5. Modelling","metadata":{"datalore":{"node_id":"PNCf05cZxmkj4mYnNrH1e6","type":"MD","hide_input_from_viewers":true,"hide_output_from_viewers":true}}},{"cell_type":"code","source":"tf_train_dataset = tokenized_ds['train'].to_tf_dataset(\n    columns=[\"attention_mask\", \"input_ids\"],\n    label_cols=[\"generated\"],\n    shuffle=True,\n    collate_fn=data_collator,\n    batch_size=batch_size,\n)\n\ntf_test_dataset = tokenized_ds['test'].to_tf_dataset(\n    columns=[\"attention_mask\", \"input_ids\"],\n    label_cols=[\"generated\"],\n    shuffle=False,\n    collate_fn=data_collator,\n    batch_size=batch_size,\n)","metadata":{"datalore":{"node_id":"59cwxcIVF1G0FQQI583weB","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:57.635235Z","iopub.execute_input":"2024-02-07T16:13:57.635464Z","iopub.status.idle":"2024-02-07T16:13:57.782092Z","shell.execute_reply.started":"2024-02-07T16:13:57.635444Z","shell.execute_reply":"2024-02-07T16:13:57.781228Z"},"trusted":true},"execution_count":80,"outputs":[{"name":"stderr","text":"/opt/conda/lib/python3.10/site-packages/datasets/arrow_dataset.py:399: FutureWarning: The output of `to_tf_dataset` will change when a passing single element list for `labels` or `columns` in the next datasets version. To return a tuple structure rather than dict, pass a single string.\nOld behaviour: columns=['a'], labels=['labels'] -> (tf.Tensor, tf.Tensor)  \n             : columns='a', labels='labels' -> (tf.Tensor, tf.Tensor)  \nNew behaviour: columns=['a'],labels=['labels'] -> ({'a': tf.Tensor}, {'labels': tf.Tensor})  \n             : columns='a', labels='labels' -> (tf.Tensor, tf.Tensor) \n  warnings.warn(\n","output_type":"stream"}]},{"cell_type":"code","source":"num_epochs = 2","metadata":{"execution":{"iopub.status.busy":"2024-02-07T16:13:57.783312Z","iopub.execute_input":"2024-02-07T16:13:57.783569Z","iopub.status.idle":"2024-02-07T16:13:57.787307Z","shell.execute_reply.started":"2024-02-07T16:13:57.783547Z","shell.execute_reply":"2024-02-07T16:13:57.786339Z"},"trusted":true},"execution_count":81,"outputs":[]},{"cell_type":"code","source":"model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2, from_pt=True)","metadata":{"datalore":{"node_id":"N0KmKqUgQAhDA3T22hXvH0","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:57.788362Z","iopub.execute_input":"2024-02-07T16:13:57.788638Z","iopub.status.idle":"2024-02-07T16:13:58.918881Z","shell.execute_reply.started":"2024-02-07T16:13:57.788614Z","shell.execute_reply":"2024-02-07T16:13:58.917823Z"},"trusted":true},"execution_count":82,"outputs":[{"name":"stderr","text":"All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.\n\nAll the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.\nIf your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.\n","output_type":"stream"}]},{"cell_type":"code","source":"# The number of training steps is the number of samples in the dataset, divided by the batch size then multiplied\n# by the total number of epochs. Note that the tf_train_dataset here is a batched tf.data.Dataset,\n# not the original Hugging Face Dataset, so its len() is already num_samples // batch_size.\nnum_train_steps = len(tf_train_dataset) * num_epochs\nlr_scheduler = PolynomialDecay(\n    initial_learning_rate=5e-5, end_learning_rate=0.0, decay_steps=num_train_steps\n)\n\nopt = Adam(learning_rate=lr_scheduler)\n\nloss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n\nes = EarlyStopping(monitor='val_loss', patience=1, verbose=1, mode='auto', restore_best_weights=True)\n\ncallback = PushToHubCallback(\n    \"LLM_generated_text_detector\", save_strategy=\"no\", tokenizer=tokenizer\n)","metadata":{"datalore":{"node_id":"KgCP6HcT4sIzhVUP361LJt","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:58.920321Z","iopub.execute_input":"2024-02-07T16:13:58.920593Z","iopub.status.idle":"2024-02-07T16:13:59.156543Z","shell.execute_reply.started":"2024-02-07T16:13:58.920568Z","shell.execute_reply":"2024-02-07T16:13:59.155652Z"},"trusted":true},"execution_count":83,"outputs":[{"name":"stderr","text":"/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'Repository' (from 'huggingface_hub.repository') is deprecated and will be removed from version '1.0'. Please prefer the http-based alternatives instead. Given its large adoption in legacy code, the complete removal is only planned on next major release.\nFor more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.\n  warnings.warn(warning_message, FutureWarning)\n/kaggle/working/LLM_generated_text_detector is already a clone of https://huggingface.co/Wintersmith/LLM_generated_text_detector. Make sure you pull the latest changes with `repo.git_pull()`.\n","output_type":"stream"}]},{"cell_type":"code","source":"model.compile(optimizer=opt, loss=loss, metrics=[\"accuracy\"])","metadata":{"datalore":{"node_id":"FWsaXd4PpT0LElBgYMitIc","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:59.157601Z","iopub.execute_input":"2024-02-07T16:13:59.157864Z","iopub.status.idle":"2024-02-07T16:13:59.169580Z","shell.execute_reply.started":"2024-02-07T16:13:59.157840Z","shell.execute_reply":"2024-02-07T16:13:59.168698Z"},"trusted":true},"execution_count":84,"outputs":[]},{"cell_type":"code","source":"model.fit(tf_train_dataset, validation_data=tf_test_dataset, epochs=num_epochs, callbacks=[es, callback])","metadata":{"datalore":{"node_id":"Q57EJS3MbWm5NXRe9pHyM6","type":"CODE","hide_input_from_viewers":true,"hide_output_from_viewers":true},"execution":{"iopub.status.busy":"2024-02-07T16:13:59.170764Z","iopub.execute_input":"2024-02-07T16:13:59.171079Z","iopub.status.idle":"2024-02-07T17:47:39.001846Z","shell.execute_reply.started":"2024-02-07T16:13:59.171049Z","shell.execute_reply":"2024-02-07T17:47:39.000824Z"},"trusted":true},"execution_count":85,"outputs":[{"name":"stdout","text":"Epoch 1/2\n1815/1815 [==============================] - 2812s 2s/step - loss: 0.0579 - accuracy: 0.9809 - val_loss: 0.0272 - val_accuracy: 0.9920\nEpoch 2/2\n1815/1815 [==============================] - 2790s 2s/step - loss: 0.0082 - accuracy: 0.9974 - val_loss: 0.0191 - val_accuracy: 0.9941\n","output_type":"stream"},{"output_type":"display_data","data":{"text/plain":"Upload file tf_model.h5:   0%|          | 1.00/256M [00:00<?, ?B/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"8131a47329644fd8a4adaeb6488e0e43"}},"metadata":{}},{"name":"stderr","text":"To https://huggingface.co/Wintersmith/LLM_generated_text_detector\n   9aa28a0..f4838f7  main -> main\n\n","output_type":"stream"},{"execution_count":85,"output_type":"execute_result","data":{"text/plain":"<keras.src.callbacks.History at 0x7c74e11d0070>"},"metadata":{}}]},{"cell_type":"code","source":"#model.save_pretrained(\"/kaggle/working/model_trained\", saved_model=True)\n","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:47:39.003003Z","iopub.execute_input":"2024-02-07T17:47:39.003387Z","iopub.status.idle":"2024-02-07T17:47:39.007252Z","shell.execute_reply.started":"2024-02-07T17:47:39.003360Z","shell.execute_reply":"2024-02-07T17:47:39.006371Z"},"trusted":true},"execution_count":86,"outputs":[]},{"cell_type":"markdown","source":"## 6. Evaluation","metadata":{}},{"cell_type":"code","source":"preds = model.predict(tf_test_dataset)[\"logits\"]\n\ny_pred = np.argmax(preds, axis=1)\nprint(preds.shape, y_pred.shape)\n\ny_pred\n\n#y_true = np.concatenate([y for x, y in tf_test_dataset], axis=0)\n# https://stackoverflow.com/questions/56226621/how-to-extract-data-labels-back-from-tensorflow-dataset","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:47:39.008525Z","iopub.execute_input":"2024-02-07T17:47:39.009205Z","iopub.status.idle":"2024-02-07T17:51:20.348390Z","shell.execute_reply.started":"2024-02-07T17:47:39.009171Z","shell.execute_reply":"2024-02-07T17:51:20.347443Z"},"trusted":true},"execution_count":87,"outputs":[{"name":"stdout","text":"451/451 [==============================] - 221s 486ms/step\n(11267, 2) (11267,)\n","output_type":"stream"},{"execution_count":87,"output_type":"execute_result","data":{"text/plain":"array([1, 0, 0, ..., 1, 1, 0])"},"metadata":{}}]},{"cell_type":"code","source":"def get_probabilities(input_text):\n    logits_pred = model.predict(input_text)['logits']\n    probs = tf.nn.sigmoid(logits_pred)\n    class_1_probability = probs[:, 1].numpy()\n    return class_1_probability","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:51:20.349613Z","iopub.execute_input":"2024-02-07T17:51:20.349921Z","iopub.status.idle":"2024-02-07T17:51:20.355158Z","shell.execute_reply.started":"2024-02-07T17:51:20.349892Z","shell.execute_reply":"2024-02-07T17:51:20.354154Z"},"trusted":true},"execution_count":88,"outputs":[]},{"cell_type":"code","source":"y_prob = get_probabilities(tf_test_dataset)\ny_prob","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:51:20.356452Z","iopub.execute_input":"2024-02-07T17:51:20.356843Z","iopub.status.idle":"2024-02-07T17:54:59.830689Z","shell.execute_reply.started":"2024-02-07T17:51:20.356797Z","shell.execute_reply":"2024-02-07T17:54:59.829525Z"},"trusted":true},"execution_count":89,"outputs":[{"name":"stdout","text":"451/451 [==============================] - 219s 486ms/step\n","output_type":"stream"},{"execution_count":89,"output_type":"execute_result","data":{"text/plain":"array([0.9966794 , 0.01150257, 0.01455727, ..., 0.99643433, 0.99624926,\n       0.03235682], dtype=float32)"},"metadata":{}}]},{"cell_type":"code","source":"y_true = np.concatenate([y for x, y in tf_test_dataset], axis=0)","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:54:59.831848Z","iopub.execute_input":"2024-02-07T17:54:59.832147Z","iopub.status.idle":"2024-02-07T17:55:02.179651Z","shell.execute_reply.started":"2024-02-07T17:54:59.832104Z","shell.execute_reply":"2024-02-07T17:55:02.178681Z"},"trusted":true},"execution_count":90,"outputs":[]},{"cell_type":"code","source":"def evaluate_model(y_true, y_pred):\n  print(ConfusionMatrixDisplay.from_predictions(y_true, y_pred))\n  print(classification_report(y_true, y_pred))\n  print('F1 score:', f1_score(y_true, y_pred))","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:02.181096Z","iopub.execute_input":"2024-02-07T17:55:02.181519Z","iopub.status.idle":"2024-02-07T17:55:02.186565Z","shell.execute_reply.started":"2024-02-07T17:55:02.181485Z","shell.execute_reply":"2024-02-07T17:55:02.185529Z"},"trusted":true},"execution_count":91,"outputs":[]},{"cell_type":"code","source":"evaluate_model(y_true=y_true, y_pred=y_pred)","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:02.187803Z","iopub.execute_input":"2024-02-07T17:55:02.188185Z","iopub.status.idle":"2024-02-07T17:55:02.500352Z","shell.execute_reply.started":"2024-02-07T17:55:02.188150Z","shell.execute_reply":"2024-02-07T17:55:02.499174Z"},"trusted":true},"execution_count":92,"outputs":[{"name":"stdout","text":"<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay object at 0x7c74cc328730>\n              precision    recall  f1-score   support\n\n           0       1.00      0.99      1.00      7613\n           1       0.99      0.99      0.99      3654\n\n    accuracy                           0.99     11267\n   macro avg       0.99      0.99      0.99     11267\nweighted avg       0.99      0.99      0.99     11267\n\nF1 score: 0.9908457439540921\n","output_type":"stream"},{"output_type":"display_data","data":{"text/plain":"<Figure size 640x480 with 2 Axes>","image/png":""},"metadata":{}}]},{"cell_type":"markdown","source":"## 7. Submitting predictions on test set","metadata":{}},{"cell_type":"code","source":"test_essays\n","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:02.501599Z","iopub.execute_input":"2024-02-07T17:55:02.501910Z","iopub.status.idle":"2024-02-07T17:55:02.513693Z","shell.execute_reply.started":"2024-02-07T17:55:02.501884Z","shell.execute_reply":"2024-02-07T17:55:02.512625Z"},"trusted":true},"execution_count":93,"outputs":[{"execution_count":93,"output_type":"execute_result","data":{"text/plain":"         id  prompt_id          text\n0  0000aaaa          2  Aaa bbb ccc.\n1  1111bbbb          3  Bbb ccc ddd.\n2  2222cccc          4  CCC ddd eee.","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>id</th>\n      <th>prompt_id</th>\n      <th>text</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0000aaaa</td>\n      <td>2</td>\n      <td>Aaa bbb ccc.</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1111bbbb</td>\n      <td>3</td>\n      <td>Bbb ccc ddd.</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2222cccc</td>\n      <td>4</td>\n      <td>CCC ddd eee.</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]},{"cell_type":"code","source":"test_essays['text'] = test_essays['text'].map(clean_text)\n","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:02.516021Z","iopub.execute_input":"2024-02-07T17:55:02.516980Z","iopub.status.idle":"2024-02-07T17:55:02.537197Z","shell.execute_reply.started":"2024-02-07T17:55:02.516927Z","shell.execute_reply":"2024-02-07T17:55:02.535910Z"},"trusted":true},"execution_count":94,"outputs":[]},{"cell_type":"code","source":"raw_final_ds = datasets.Dataset.from_pandas(test_essays)\nraw_final_ds","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:02.538476Z","iopub.execute_input":"2024-02-07T17:55:02.538853Z","iopub.status.idle":"2024-02-07T17:55:02.556050Z","shell.execute_reply.started":"2024-02-07T17:55:02.538825Z","shell.execute_reply":"2024-02-07T17:55:02.554954Z"},"trusted":true},"execution_count":95,"outputs":[{"execution_count":95,"output_type":"execute_result","data":{"text/plain":"Dataset({\n    features: ['id', 'prompt_id', 'text'],\n    num_rows: 3\n})"},"metadata":{}}]},{"cell_type":"code","source":"tokenized_test_dataset = raw_final_ds.map(tokenize_and_split, batched=True)\ntokenized_test_dataset","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:02.557634Z","iopub.execute_input":"2024-02-07T17:55:02.558575Z","iopub.status.idle":"2024-02-07T17:55:02.611994Z","shell.execute_reply.started":"2024-02-07T17:55:02.558522Z","shell.execute_reply":"2024-02-07T17:55:02.610964Z"},"trusted":true},"execution_count":96,"outputs":[{"output_type":"display_data","data":{"text/plain":"Map:   0%|          | 0/3 [00:00<?, ? examples/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"2935bcca138c4ede9876e51cb63fd1d5"}},"metadata":{}},{"execution_count":96,"output_type":"execute_result","data":{"text/plain":"Dataset({\n    features: ['id', 'prompt_id', 'text', 'input_ids', 'attention_mask'],\n    num_rows: 3\n})"},"metadata":{}}]},{"cell_type":"code","source":"tf_final_dataset = tokenized_test_dataset.to_tf_dataset(\n    columns=[\"attention_mask\", \"input_ids\"],\n    #label_cols=[\"target\"],\n    shuffle=False,\n    collate_fn=data_collator,\n    batch_size=batch_size,\n)","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:02.613469Z","iopub.execute_input":"2024-02-07T17:55:02.613883Z","iopub.status.idle":"2024-02-07T17:55:02.683005Z","shell.execute_reply.started":"2024-02-07T17:55:02.613845Z","shell.execute_reply":"2024-02-07T17:55:02.682161Z"},"trusted":true},"execution_count":97,"outputs":[]},{"cell_type":"code","source":"class_1_final_probability = get_probabilities(tf_final_dataset)","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:02.684201Z","iopub.execute_input":"2024-02-07T17:55:02.684505Z","iopub.status.idle":"2024-02-07T17:55:04.305151Z","shell.execute_reply.started":"2024-02-07T17:55:02.684479Z","shell.execute_reply":"2024-02-07T17:55:04.304117Z"},"trusted":true},"execution_count":98,"outputs":[{"name":"stdout","text":"1/1 [==============================] - 2s 2s/step\n","output_type":"stream"}]},{"cell_type":"code","source":"sample_submission[\"generated\"] = class_1_final_probability","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:04.306598Z","iopub.execute_input":"2024-02-07T17:55:04.306963Z","iopub.status.idle":"2024-02-07T17:55:04.311697Z","shell.execute_reply.started":"2024-02-07T17:55:04.306929Z","shell.execute_reply":"2024-02-07T17:55:04.310770Z"},"trusted":true},"execution_count":99,"outputs":[]},{"cell_type":"code","source":"sample_submission.to_csv(\"/kaggle/working/submission.csv\", index=False)","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:04.313079Z","iopub.execute_input":"2024-02-07T17:55:04.313775Z","iopub.status.idle":"2024-02-07T17:55:04.334285Z","shell.execute_reply.started":"2024-02-07T17:55:04.313692Z","shell.execute_reply":"2024-02-07T17:55:04.333453Z"},"trusted":true},"execution_count":100,"outputs":[]},{"cell_type":"markdown","source":"## 8. Predict one sentence","metadata":{}},{"cell_type":"code","source":"# sequence_text = input your own text\nsequence_text = raw_ds['test']['text'][15]\nprint(f\"Label: {raw_ds['test']['generated'][15]}\")\nprint(f\"Text:\\n {sequence_text}\")","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:04.335269Z","iopub.execute_input":"2024-02-07T17:55:04.335504Z","iopub.status.idle":"2024-02-07T17:55:04.471992Z","shell.execute_reply.started":"2024-02-07T17:55:04.335477Z","shell.execute_reply":"2024-02-07T17:55:04.471135Z"},"trusted":true},"execution_count":101,"outputs":[{"name":"stdout","text":"Label: 0\nText:\n to principal students should not be able to have or do community service if they where not convicted of a crime they committed in school of on school grounds community service is understandable to me but its not fair to have students help out others if they werent convicted of bad behavior an example to that u cant order a student to do community service for himher for picking up trash another student dropped in the hallway in addition to that students have the rite to do community service if not given to himher but if community service is given then heshe should be punished for there actions but what did they do to deserve community service some students should be punished for bad behavior talking back even just being plan out disrespectful to one another but what they shouldnt have community service for is helping out another person when they are caught in the act of someone else being mean to himher also being a good character to the school classmates teachers staff and custodians if someone is not so you can show that person how to be respectful an how some should act at all times maybe staff and teachers like to punish us because of a ruff past they had growing up as a child thats not rite so they want to give us community service make us there little pets but that shouldnt be the answer to none of there problems sencirley studentname\n","output_type":"stream"}]},{"cell_type":"code","source":"sequence = sequence_text\nmodel_input = tokenizer(sequence, max_length=512, padding=True, truncation=True, return_tensors='tf')\nmodel_input = dict(model_input)","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:04.473129Z","iopub.execute_input":"2024-02-07T17:55:04.473403Z","iopub.status.idle":"2024-02-07T17:55:04.478948Z","shell.execute_reply.started":"2024-02-07T17:55:04.473379Z","shell.execute_reply":"2024-02-07T17:55:04.478053Z"},"trusted":true},"execution_count":102,"outputs":[]},{"cell_type":"code","source":"get_probabilities(model_input)","metadata":{"execution":{"iopub.status.busy":"2024-02-07T17:55:04.479877Z","iopub.execute_input":"2024-02-07T17:55:04.480184Z","iopub.status.idle":"2024-02-07T17:55:06.157918Z","shell.execute_reply.started":"2024-02-07T17:55:04.480160Z","shell.execute_reply":"2024-02-07T17:55:06.157147Z"},"trusted":true},"execution_count":103,"outputs":[{"name":"stdout","text":"1/1 [==============================] - 2s 2s/step\n","output_type":"stream"},{"execution_count":103,"output_type":"execute_result","data":{"text/plain":"array([0.01741369], dtype=float32)"},"metadata":{}}]}]}