dougtrajano commited on
Commit
30079e1
1 Parent(s): 191b08d

update model card README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -61
README.md CHANGED
@@ -1,12 +1,6 @@
1
  ---
2
- language:
3
- - pt
4
- license: apache-2.0
5
  tags:
6
- - toxicity
7
- - portuguese
8
- - hate speech
9
- - offensive language
10
  - generated_from_trainer
11
  metrics:
12
  - accuracy
@@ -14,72 +8,40 @@ metrics:
14
  - precision
15
  - recall
16
  model-index:
17
- - name: dougtrajano/toxic-comment-classification
18
  results: []
19
- datasets:
20
- - dougtrajano/olid-br
21
- library_name: transformers
22
  ---
23
 
24
- # dougtrajano/toxic-comment-classification
 
25
 
26
- Toxic Comment Classification is a model that detects if the text is toxic or not.
27
 
28
- This BERT model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the [OLID-BR dataset](https://huggingface.co/datasets/dougtrajano/olid-br).
 
 
 
 
 
 
29
 
30
- ## Overview
31
 
32
- **Input:** Text in Brazilian Portuguese
33
 
34
- **Output:** Binary classification (toxic or not toxic)
35
 
36
- ## Usage
37
 
38
- ```python
39
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
40
 
41
- tokenizer = AutoTokenizer.from_pretrained("dougtrajano/toxic-comment-classification")
42
-
43
- model = AutoModelForSequenceClassification.from_pretrained("dougtrajano/toxic-comment-classification")
44
- ```
45
-
46
- ## Limitations and bias
47
-
48
- The following factors may degrade the model’s performance.
49
-
50
- **Text Language**: The model was trained on Brazilian Portuguese texts, so it may not work well with Portuguese dialects.
51
-
52
- **Text Origin**: The model was trained on texts from social media and a few texts from other sources, so it may not work well on other types of texts.
53
-
54
- ## Trade-offs
55
-
56
- Sometimes models exhibit performance issues under particular circumstances. In this section, we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.
57
-
58
- **Text Length**: The model was fine-tuned on texts with a word count between 1 and 178 words (average of 18 words). It may give poor results on texts with a word count outside this range.
59
-
60
- ## Performance
61
-
62
- The model was evaluated on the test set of the [OLID-BR](https://dougtrajano.github.io/olid-br/) dataset.
63
-
64
- **Accuracy:** 0.8578
65
-
66
- **Precision:** 0.8594
67
-
68
- **Recall:** 0.8578
69
-
70
- **F1-Score:** 0.8580
71
-
72
- | Class | Precision | Recall | F1-Score | Support |
73
- | :---: | :-------: | :----: | :------: | :-----: |
74
- | `NOT-OFFENSIVE` | 0.8886 | 0.8490 | 0.8683 | 1,775 |
75
- | `OFFENSIVE` | 0.8233 | 0.8686 | 0.8453 | 1,438 |
76
 
77
  ## Training procedure
78
 
79
  ### Training hyperparameters
80
 
81
  The following hyperparameters were used during training:
82
-
83
  - learning_rate: 3.255788747459486e-05
84
  - train_batch_size: 8
85
  - eval_batch_size: 8
@@ -89,13 +51,21 @@ The following hyperparameters were used during training:
89
  - num_epochs: 30
90
  - label_smoothing_factor: 0.07158711257743958
91
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  ### Framework versions
93
 
94
- - Transformers 4.26.0
95
  - Pytorch 1.10.2+cu113
96
  - Datasets 2.9.0
97
  - Tokenizers 0.13.2
98
-
99
- ## Provide Feedback
100
-
101
- If you have any feedback on this model, please [open an issue](https://github.com/DougTrajano/ToChiquinho/issues/new) on GitHub.
 
1
  ---
2
+ license: mit
 
 
3
  tags:
 
 
 
 
4
  - generated_from_trainer
5
  metrics:
6
  - accuracy
 
8
  - precision
9
  - recall
10
  model-index:
11
+ - name: toxic-comment-classification
12
  results: []
 
 
 
13
  ---
14
 
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
 
18
+ # toxic-comment-classification
19
 
20
+ This model is a fine-tuned version of [neuralmind/bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-large-portuguese-cased) on the None dataset.
21
+ It achieves the following results on the evaluation set:
22
+ - Loss: 0.4102
23
+ - Accuracy: 0.8547
24
+ - F1: 0.8549
25
+ - Precision: 0.8669
26
+ - Recall: 0.8547
27
 
28
+ ## Model description
29
 
30
+ More information needed
31
 
32
+ ## Intended uses & limitations
33
 
34
+ More information needed
35
 
36
+ ## Training and evaluation data
 
37
 
38
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Training procedure
41
 
42
  ### Training hyperparameters
43
 
44
  The following hyperparameters were used during training:
 
45
  - learning_rate: 3.255788747459486e-05
46
  - train_batch_size: 8
47
  - eval_batch_size: 8
 
51
  - num_epochs: 30
52
  - label_smoothing_factor: 0.07158711257743958
53
 
54
+ ### Training results
55
+
56
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
57
+ |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
58
+ | 0.4465 | 1.0 | 1408 | 0.4102 | 0.8547 | 0.8549 | 0.8669 | 0.8547 |
59
+ | 0.3839 | 2.0 | 2816 | 0.4814 | 0.8509 | 0.8497 | 0.8532 | 0.8509 |
60
+ | 0.3945 | 3.0 | 4224 | 0.6362 | 0.8002 | 0.7918 | 0.8258 | 0.8002 |
61
+ | 0.3643 | 4.0 | 5632 | 0.4961 | 0.8248 | 0.8211 | 0.8349 | 0.8248 |
62
+ | 0.3345 | 5.0 | 7040 | 0.5267 | 0.8528 | 0.8532 | 0.8570 | 0.8528 |
63
+ | 0.3053 | 6.0 | 8448 | 0.5902 | 0.8002 | 0.7911 | 0.8292 | 0.8002 |
64
+
65
+
66
  ### Framework versions
67
 
68
+ - Transformers 4.26.1
69
  - Pytorch 1.10.2+cu113
70
  - Datasets 2.9.0
71
  - Tokenizers 0.13.2