knowhate
/

HateBERTimbau

Not-For-All-Audiences

Model card Files Files and versions Community

gilramos commited on May 13, 2024

Commit

8761ba5

·

verified ·

1 Parent(s): 89a89a4

Update README.md

Files changed (1) hide show

README.md +13 -10

README.md CHANGED Viewed

@@ -25,11 +25,9 @@ HateBERTimbau is a transformer-based encoder model for identifying hate speech i
 ## Model Description
-<!-- Provide a longer summary of what this model is. -->
 - **Developed by:** [kNOwHATE: kNOwing online HATE speech: knowledge + awareness = TacklingHate](https://knowhate.eu)
 - **Funded by:** [European Union](https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/cerv-2021-equal)
-- **Model type:** [More Information Needed]
 - **Language:** Portuguese
 - **Finetuned from model:** [neuralmind/bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-large-portuguese-cased)
@@ -39,11 +37,7 @@ HateBERTimbau is a transformer-based encoder model for identifying hate speech i
 ## Training Data
-229,103 tweets associated with offensive content were used
-## Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 ## Training Hyperparameters
@@ -52,13 +46,22 @@ HateBERTimbau is a transformer-based encoder model for identifying hate speech i
 - Learning Rate: 5e-5 with Adam optimizer
 - Maximum Sequence Length: 512 sentence pieces
-## Evaluation
 ## Testing Data
 ## Results
 ## BibTeX Citation
 [More Information Needed]

 ## Model Description
 - **Developed by:** [kNOwHATE: kNOwing online HATE speech: knowledge + awareness = TacklingHate](https://knowhate.eu)
 - **Funded by:** [European Union](https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/cerv-2021-equal)
+- **Model type:** Transformer-based text classification model fine-tuned for hate speech in Portuguese social media text
 - **Language:** Portuguese
 - **Finetuned from model:** [neuralmind/bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-large-portuguese-cased)
 ## Training Data
+229,103 tweets associated with offensive content were used to retrain the base model
 ## Training Hyperparameters
 - Learning Rate: 5e-5 with Adam optimizer
 - Maximum Sequence Length: 512 sentence pieces
 ## Testing Data
+We used two different datasets for testing, one for YouTube comments [here](https://huggingface.co/datasets/knowhate/youtube-test) and another for Tweets [here](https://huggingface.co/datasets/knowhate/twitter-test).
+YouTube Test Set:
+- Total nº of comments: 825
+- % Hate Speech: 72.24%
+Twitter Test Set:
+- Total nº of tweets: 805
+- % Hate Speech: 20.62%
 ## Results
 ## BibTeX Citation
 [More Information Needed]