Update README.md
Browse files
README.md
CHANGED
@@ -25,11 +25,9 @@ HateBERTimbau is a transformer-based encoder model for identifying hate speech i
|
|
25 |
|
26 |
## Model Description
|
27 |
|
28 |
-
<!-- Provide a longer summary of what this model is. -->
|
29 |
-
|
30 |
- **Developed by:** [kNOwHATE: kNOwing online HATE speech: knowledge + awareness = TacklingHate](https://knowhate.eu)
|
31 |
- **Funded by:** [European Union](https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/cerv-2021-equal)
|
32 |
-
- **Model type:**
|
33 |
- **Language:** Portuguese
|
34 |
- **Finetuned from model:** [neuralmind/bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-large-portuguese-cased)
|
35 |
|
@@ -39,11 +37,7 @@ HateBERTimbau is a transformer-based encoder model for identifying hate speech i
|
|
39 |
|
40 |
## Training Data
|
41 |
|
42 |
-
229,103 tweets associated with offensive content were used
|
43 |
-
|
44 |
-
## Training Procedure
|
45 |
-
|
46 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
47 |
|
48 |
## Training Hyperparameters
|
49 |
|
@@ -52,13 +46,22 @@ HateBERTimbau is a transformer-based encoder model for identifying hate speech i
|
|
52 |
- Learning Rate: 5e-5 with Adam optimizer
|
53 |
- Maximum Sequence Length: 512 sentence pieces
|
54 |
|
55 |
-
## Evaluation
|
56 |
-
|
57 |
## Testing Data
|
58 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
## Results
|
60 |
|
61 |
|
|
|
62 |
## BibTeX Citation
|
63 |
|
64 |
[More Information Needed]
|
|
|
25 |
|
26 |
## Model Description
|
27 |
|
|
|
|
|
28 |
- **Developed by:** [kNOwHATE: kNOwing online HATE speech: knowledge + awareness = TacklingHate](https://knowhate.eu)
|
29 |
- **Funded by:** [European Union](https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/cerv-2021-equal)
|
30 |
+
- **Model type:** Transformer-based text classification model fine-tuned for hate speech in Portuguese social media text
|
31 |
- **Language:** Portuguese
|
32 |
- **Finetuned from model:** [neuralmind/bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-large-portuguese-cased)
|
33 |
|
|
|
37 |
|
38 |
## Training Data
|
39 |
|
40 |
+
229,103 tweets associated with offensive content were used to retrain the base model
|
|
|
|
|
|
|
|
|
41 |
|
42 |
## Training Hyperparameters
|
43 |
|
|
|
46 |
- Learning Rate: 5e-5 with Adam optimizer
|
47 |
- Maximum Sequence Length: 512 sentence pieces
|
48 |
|
|
|
|
|
49 |
## Testing Data
|
50 |
|
51 |
+
We used two different datasets for testing, one for YouTube comments [here](https://huggingface.co/datasets/knowhate/youtube-test) and another for Tweets [here](https://huggingface.co/datasets/knowhate/twitter-test).
|
52 |
+
|
53 |
+
YouTube Test Set:
|
54 |
+
- Total nº of comments: 825
|
55 |
+
- % Hate Speech: 72.24%
|
56 |
+
|
57 |
+
Twitter Test Set:
|
58 |
+
- Total nº of tweets: 805
|
59 |
+
- % Hate Speech: 20.62%
|
60 |
+
|
61 |
## Results
|
62 |
|
63 |
|
64 |
+
|
65 |
## BibTeX Citation
|
66 |
|
67 |
[More Information Needed]
|