FredZhang7
/

malphish-eater-v1

Text Classification

Inference Endpoints

Model card Files Files and versions Community

FredZhang7 commited on Aug 11, 2023

Commit

cf75e58

•

1 Parent(s): d2fd2f1

add evaluation

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -54,14 +54,15 @@ The classification task for v1 is split into two stages:
 1. URL features model
     - **96.5%+ accurate** on training and validation data
     - 2,436,727 rows of labelled URLs
-    - evaluation from v2: slightly overfitted
 2. Website features model
     - **98.4% accurate** on training data, and **98.9% accurate** on validation data
     - 911,180 rows of 42 features
 ## Training Features
 I applied cross-validation with `cv=5` to the training dataset to search for the best hyperparameters.
-Here's the dict passed to `GridSearchCV`:
 ```python
 params = {
     'objective': 'binary',

 1. URL features model
     - **96.5%+ accurate** on training and validation data
     - 2,436,727 rows of labelled URLs
+    - evaluation from v2: slightly overfitted, by perhaps around 0.8%
 2. Website features model
     - **98.4% accurate** on training data, and **98.9% accurate** on validation data
     - 911,180 rows of 42 features
+    - evaluation from v2: biased towards the URL feature (bert_confidence) column
 ## Training Features
 I applied cross-validation with `cv=5` to the training dataset to search for the best hyperparameters.
+Here's the dict passed to `sklearn`'s '`GridSearchCV` function:
 ```python
 params = {
     'objective': 'binary',