Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
---
|
6 |
+
# Toxicity_model
|
7 |
+
|
8 |
+
The Toxicity_model is used to differentiates polite from unpolite responses.
|
9 |
+
|
10 |
+
The model was trained with a dataset composed of toxic_response and non_toxic_response.
|
11 |
+
|
12 |
+
## Details
|
13 |
+
- Size: 4,689,681 parameters
|
14 |
+
- Dataset: [Toxic Comment Classification Challenge Dataset](https://github.com/tianqwang/Toxic-Comment-Classification-Challenge)
|
15 |
+
- Language: English
|
16 |
+
- Number of Training Steps: 20
|
17 |
+
- Batch size: 16
|
18 |
+
- Optimizer: Adam
|
19 |
+
- Learning Rate: 0.001
|
20 |
+
- GPU: T4
|
21 |
+
- This repository has the source [code used](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/master/ML%20Intro%20Course/15_toxicity_detection.ipynb) to train this model.
|
22 |
+
|
23 |
+
## Usage
|
24 |
+
|
25 |
+
⚠️ THE EXAMPLES BELOW CONTAIN TOXIC/OFFENSIVE LANGUAGE ⚠️
|
26 |
+
|
27 |
+
```
|
28 |
+
import tensorflow as tf
|
29 |
+
|
30 |
+
toxicity_model = tf.keras.models.load_model('toxicity_model.keras')
|
31 |
+
|
32 |
+
with open('toxic_vocabulary.txt', encoding='utf-8') as fp:
|
33 |
+
vocabulary = [line.strip() for line in fp]
|
34 |
+
fp.close()
|
35 |
+
|
36 |
+
vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=20000,
|
37 |
+
output_mode="int",
|
38 |
+
output_sequence_length=100,
|
39 |
+
vocabulary=vocabulary)
|
40 |
+
|
41 |
+
strings = [
|
42 |
+
'I think you should shut up your big mouth',
|
43 |
+
'I do not agree with you'
|
44 |
+
]
|
45 |
+
|
46 |
+
preds = toxicity_model.predict(vectorization_layer(strings),verbose=0)
|
47 |
+
|
48 |
+
for i, string in enumerate(strings):
|
49 |
+
print(f'{string}\n')
|
50 |
+
print(f'Toxic 🤬 {round((1 - preds[i][0]) * 100, 2)}% | Not toxic 😊 {round(preds[i][0] * 100, 2)}\n')
|
51 |
+
print("_" * 50)
|
52 |
+
|
53 |
+
```
|
54 |
+
|
55 |
+
This will output the following:
|
56 |
+
```
|
57 |
+
I think you should shut up your big mouth
|
58 |
+
|
59 |
+
Toxic 🤬 95.73% | Not toxic 😊 4.27
|
60 |
+
__________________________________________________
|
61 |
+
I do not agree with you
|
62 |
+
|
63 |
+
Toxic 🤬 0.99% | Not toxic 😊 99.01
|
64 |
+
__________________________________________________
|
65 |
+
```
|
66 |
+
|
67 |
+
# Cite as 🤗
|
68 |
+
```
|
69 |
+
@misc{teenytinycastle,
|
70 |
+
doi = {10.5281/zenodo.7112065},
|
71 |
+
url = {https://huggingface.co/AiresPucrs/toxicity_model},
|
72 |
+
author = {Nicholas Kluge Corr{\^e}a},
|
73 |
+
title = {Teeny-Tiny Castle},
|
74 |
+
year = {2023},
|
75 |
+
publisher = {HuggingFace},
|
76 |
+
journal = {HuggingFace repository},
|
77 |
+
}
|
78 |
+
```
|
79 |
+
## License
|
80 |
+
The ToxicityModel is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.
|