---
license: mit
library_name: sklearn
tags:
- text-classification
- sklearn
- phishing
- url
- onnx
model_format: pickle
model_file: model.pkl
inference: true
pipeline_tag: text-classification
---

# Model Description


The model predicts the probability that a URL is a phishing site using a list of features.  
To understand what phishing is, refer to the Wikipedia page:  
[https://en.wikipedia.org/wiki/Phishing](https://en.wikipedia.org/wiki/Phishing)
(this is not a phishing link 😜)

- **Model type:** Linear SVM 
- **Task:** Binary classification
- **License:** MIT
- **Repository:** https://github.com/pirocheto/phishing-url-detection


## Evaluation

| Metric    |    Value |
|-----------|----------|
| accuracy  | 0.945652 |
| f1-score  | 0.945114 |
| precision | 0.951996 |
| recall    | 0.938331 |

# How to Get Started with the Model


Using pickle in Python is discouraged due to security risks during data deserialization, potentially allowing code injection.
It lacks portability across Python versions and interoperability with other languages.

Instead, we recommend using the ONNX model, which is more secure. 
In addition to being lighter and faster, it can be utilized by languages supported by the [ONNX runtime](https://onnxruntime.ai/docs/get-started/) (see below for an example using NodeJS).


## With ONNX (recommanded)

### Python

<details>
  <summary>See the code snippet</summary>

```python
import numpy as np
import onnxruntime
from huggingface_hub import hf_hub_download

REPO_ID = "pirocheto/phishing-url-detection"
FILENAME = "model.onnx"
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# Initializing the ONNX Runtime session with the pre-trained model
sess = onnxruntime.InferenceSession(
    model_path,
    providers=["CPUExecutionProvider"],
)

urls = [
    "https://en.wikipedia.org/wiki/Phishing",
    "http//weird-website.com",
]
inputs = np.array(urls, dtype="str")

# Using the ONNX model to make predictions on the input data
results = sess.run(None, {"inputs": inputs})[1]

for url, proba in zip(urls, results):
    print(f"URL: {url}")
    print(f"Likelihood of being a phishing site: {proba[1] * 100:.2f} %")
    print("----")

# Expected output:
# URL: https://en.wikipedia.org/wiki/Phishing
# Likelihood of being a phishing site: 0.00 %
# ----
# URL: http//weird-website.com
# Likelihood of being a phishing site: 66.34 %
# ----
```
</details>

### NodeJS

<details>
  <summary>See the code snippet</summary>

```javascript
const ort = require('onnxruntime-node');

async function main() {
    // Make sure you have downloaded the model.onnx
    // Creating an ONNX inference session with the specified model
    const model_path = "./models/model.onnx";
    const session = await ort.InferenceSession.create(model_path);

    const urls = [
        "https://en.wikipedia.org/wiki/Phishing",
        "http//weird-website.com",
    ]
    
    // Creating an ONNX tensor from the input data
    const tensor = new ort.Tensor('string', urls, [urls.length,]);
    
    // Executing the inference session with the input tensor
    const results = await session.run({"inputs": tensor});
    const probas = results['probabilities'].data;
    
    // Displaying results for each URL
    urls.forEach((url, index) => {
        const proba = probas[index * 2 + 1];
        const percent = (proba * 100).toFixed(2);
        
        console.log(`URL: ${url}`);
        console.log(`Likelihood of being a phishing site: ${percent}%`);
        console.log("----");
    });
};

main();

// URL: https://en.wikipedia.org/wiki/Phishing
// Likelihood of being a phishing site: 0.00%
// ----
// URL: http//weird-website.com
// Likelihood of being a phishing site: 66.34%
// ----

```
</details>


## With joblib or pickle (not recommanded)

<details>
  <summary>See the code snippet</summary>

```python
import joblib
from huggingface_hub import hf_hub_download

REPO_ID = "pirocheto/phishing-url-detection"
FILENAME = "model.pkl"

# Download the model from the Hugging Face Model Hub
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

urls = [
    "https://en.wikipedia.org/wiki/Phishing",
    "http//weird-website.com",
]

# Load the downloaded model using joblib
model = joblib.load(model_path)

# Predict probabilities for each URL
probas = model.predict_proba(urls)

for url, proba in zip(urls, probas):
    print(f"URL: {url}")
    print(f"Likelihood of being a phishing site: {proba[1] * 100:.2f} %")
    print("----")
```
</details>