Update README.md
Browse files
README.md
CHANGED
@@ -31,22 +31,32 @@ HateBERTimbau is a transformer-based encoder model for identifying hate speech i
|
|
31 |
- **Language:** Portuguese
|
32 |
- **Finetuned from model:** [neuralmind/bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-large-portuguese-cased)
|
33 |
|
|
|
|
|
34 |
## Uses
|
35 |
|
36 |
[More Information Needed]
|
37 |
|
38 |
-
|
|
|
|
|
|
|
|
|
39 |
|
40 |
229,103 tweets associated with offensive content were used to retrain the base model.
|
41 |
|
42 |
-
|
43 |
|
44 |
- Batch Size: 4 samples
|
45 |
- Epochs: 100
|
46 |
- Learning Rate: 5e-5 with Adam optimizer
|
47 |
- Maximum Sequence Length: 512 sentence pieces
|
48 |
|
49 |
-
|
|
|
|
|
|
|
|
|
50 |
|
51 |
We used two different datasets for testing, one for YouTube comments [here](https://huggingface.co/datasets/knowhate/youtube-test) and another for Tweets [here](https://huggingface.co/datasets/knowhate/twitter-test).
|
52 |
|
@@ -58,13 +68,15 @@ Twitter Test Set:
|
|
58 |
- Total nº of tweets: 805
|
59 |
- % Hate Speech: 20.62%
|
60 |
|
61 |
-
|
62 |
|
63 |
| Dataset | Precision | Recall | F1-score |
|
64 |
-
|
65 |
| **YouTube** | 0.928 | 0.108 | **0.193** |
|
66 |
| **Twitter** | 0.686 | 0.211 | **0.323** |
|
67 |
|
|
|
|
|
68 |
## BibTeX Citation
|
69 |
|
70 |
``` latex
|
@@ -82,7 +94,7 @@ copyright = {embargoed-access},
|
|
82 |
}
|
83 |
```
|
84 |
|
85 |
-
|
86 |
|
87 |
## Acknowledgements
|
88 |
|
|
|
31 |
- **Language:** Portuguese
|
32 |
- **Finetuned from model:** [neuralmind/bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-large-portuguese-cased)
|
33 |
|
34 |
+
<br>
|
35 |
+
|
36 |
## Uses
|
37 |
|
38 |
[More Information Needed]
|
39 |
|
40 |
+
<br>
|
41 |
+
|
42 |
+
## Training
|
43 |
+
|
44 |
+
### Data
|
45 |
|
46 |
229,103 tweets associated with offensive content were used to retrain the base model.
|
47 |
|
48 |
+
### Training Hyperparameters
|
49 |
|
50 |
- Batch Size: 4 samples
|
51 |
- Epochs: 100
|
52 |
- Learning Rate: 5e-5 with Adam optimizer
|
53 |
- Maximum Sequence Length: 512 sentence pieces
|
54 |
|
55 |
+
<br>
|
56 |
+
|
57 |
+
## Testing
|
58 |
+
|
59 |
+
### Data
|
60 |
|
61 |
We used two different datasets for testing, one for YouTube comments [here](https://huggingface.co/datasets/knowhate/youtube-test) and another for Tweets [here](https://huggingface.co/datasets/knowhate/twitter-test).
|
62 |
|
|
|
68 |
- Total nº of tweets: 805
|
69 |
- % Hate Speech: 20.62%
|
70 |
|
71 |
+
### Results
|
72 |
|
73 |
| Dataset | Precision | Recall | F1-score |
|
74 |
+
|:----------------|:-----------|:----------|:-------------|
|
75 |
| **YouTube** | 0.928 | 0.108 | **0.193** |
|
76 |
| **Twitter** | 0.686 | 0.211 | **0.323** |
|
77 |
|
78 |
+
<br>
|
79 |
+
|
80 |
## BibTeX Citation
|
81 |
|
82 |
``` latex
|
|
|
94 |
}
|
95 |
```
|
96 |
|
97 |
+
<br>
|
98 |
|
99 |
## Acknowledgements
|
100 |
|