mastavtsev
commited on
Commit
•
24dd3b4
1
Parent(s):
d66695c
Update README.md
Browse files
README.md
CHANGED
@@ -2,21 +2,25 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
|
6 |
-
research can be found here: https://github.com/mastavtsev/PM_NLP/tree/main.
|
7 |
|
8 |
-
|
9 |
-
|
10 |
-
of trace tokens corresponding to normal program execution.
|
11 |
|
12 |
-
|
13 |
-
|
14 |
-
- 512 context window size
|
15 |
-
- 2.5e-3 learning rate
|
16 |
-
- LAMB optimizer
|
17 |
-
- 300 epochs
|
18 |
-
- 43.6 million parameters
|
19 |
-
- 1.5 hours in Google Colab with GPU A100
|
20 |
|
21 |
-
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
## SqueezeBERT Model for Unsupervised Anomaly Detection
|
|
|
6 |
|
7 |
+
### Overview
|
8 |
+
This model was developed as part of a Course Project during my third year at HSE Faculty of Computer Science (FCS). It utilizes the SqueezeBERT architecture, tailored for the task of unsupervised anomaly detection. The model identifies anomalies by learning representations of trace tokens indicative of normal program execution through masked language modeling.
|
|
|
9 |
|
10 |
+
### Research Notebooks
|
11 |
+
Detailed Python notebooks documenting the research and methodology are available on GitHub: [Visit GitHub Repository](https://github.com/mastavtsev/PM_NLP/tree/main).
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
+
### Model Configuration
|
14 |
+
- **Architecture**: SqueezeBERT, adapted for masked language modeling.
|
15 |
+
- **Tokenizer**: LOA 13 with a dictionary size of 20,000 and a maximum token length of 300.
|
16 |
+
- **Context Window Size**: 512 tokens.
|
17 |
+
- **Learning Rate**: 2.5e-3.
|
18 |
+
- **Optimizer**: LAMB.
|
19 |
+
- **Training Duration**: Trained for 300 epochs.
|
20 |
+
- **Parameters**: 43.6 million.
|
21 |
+
- **Training Environment**: Google Colab, utilizing an A100 GPU, with a training time of approximately 1.5 hours.
|
22 |
+
|
23 |
+
### Model Performance
|
24 |
+
The model's effectiveness in anomaly detection is evidenced by its performance on test data. For visual representation of the model's capability to segregate normal vs. anomalous execution traces, see the results [here](https://cdn-uploads.huggingface.co/production/uploads/661915f3fdcb7df8a6e2fdaa/bsZ8dXWN6IoAH4zXfwzi2.png).
|
25 |
+
|
26 |
+
This detailed configuration and performance data is provided to facilitate replication and further experimentation by the community. The use of the Apache-2.0 license allows for both academic and commercial use, promoting wider adoption and potential contributions to the model's development.
|