dougtrajano commited on
Commit
f65aa57
1 Parent(s): c0198b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -31
README.md CHANGED
@@ -16,38 +16,68 @@ metrics:
16
  model-index:
17
  - name: dougtrajano/toxic-comment-classification
18
  results: []
 
 
 
19
  ---
20
 
21
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
22
- should probably proofread and complete it, then remove this comment. -->
23
 
24
- # dougtrajano/toxic-comment-classification
25
 
26
- This model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the OLID-BR dataset.
27
- It achieves the following results on the evaluation set:
28
- - Loss: 0.5590
29
- - Accuracy: 0.8578
30
- - F1: 0.8580
31
- - Precision: 0.8594
32
- - Recall: 0.8578
33
 
34
- ## Model description
35
 
36
- More information needed
37
 
38
- ## Intended uses & limitations
39
 
40
- More information needed
 
41
 
42
- ## Training and evaluation data
43
 
44
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## Training procedure
47
 
48
  ### Training hyperparameters
49
 
50
  The following hyperparameters were used during training:
 
51
  - learning_rate: 3.255788747459486e-05
52
  - train_batch_size: 8
53
  - eval_batch_size: 8
@@ -57,24 +87,13 @@ The following hyperparameters were used during training:
57
  - num_epochs: 30
58
  - label_smoothing_factor: 0.07158711257743958
59
 
60
- ### Training results
61
-
62
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
63
- |:-------------:|:-----:|:-----:|:---------------:|:--------:|:------:|:---------:|:------:|
64
- | 0.4422 | 1.0 | 1408 | 0.4197 | 0.8466 | 0.8470 | 0.8505 | 0.8466 |
65
- | 0.3566 | 2.0 | 2816 | 0.4724 | 0.8413 | 0.8394 | 0.8453 | 0.8413 |
66
- | 0.3135 | 3.0 | 4224 | 0.4801 | 0.8447 | 0.8434 | 0.8470 | 0.8447 |
67
- | 0.2638 | 4.0 | 5632 | 0.5590 | 0.8578 | 0.8580 | 0.8594 | 0.8578 |
68
- | 0.2314 | 5.0 | 7040 | 0.5605 | 0.8491 | 0.8487 | 0.8489 | 0.8491 |
69
- | 0.2221 | 6.0 | 8448 | 0.6369 | 0.8416 | 0.8414 | 0.8414 | 0.8416 |
70
- | 0.1939 | 7.0 | 9856 | 0.6518 | 0.8400 | 0.8402 | 0.8405 | 0.8400 |
71
- | 0.2015 | 8.0 | 11264 | 0.6042 | 0.8462 | 0.8457 | 0.8465 | 0.8462 |
72
- | 0.1989 | 9.0 | 12672 | 0.6236 | 0.8500 | 0.8496 | 0.8499 | 0.8500 |
73
-
74
-
75
  ### Framework versions
76
 
77
  - Transformers 4.26.0
78
  - Pytorch 1.10.2+cu113
79
  - Datasets 2.9.0
80
  - Tokenizers 0.13.2
 
 
 
 
 
16
  model-index:
17
  - name: dougtrajano/toxic-comment-classification
18
  results: []
19
+ datasets:
20
+ - dougtrajano/olid-br
21
+ library_name: transformers
22
  ---
23
 
24
+ Toxic Comment Classification is a model that detects if the text is toxic or not.
 
25
 
26
+ This BERT model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the [OLID-BR dataset](https://huggingface.co/datasets/dougtrajano/olid-br).
27
 
28
+ ## Overview
 
 
 
 
 
 
29
 
30
+ **Input:** Text in Brazilian Portuguese
31
 
32
+ **Output:** Binary classification (toxic or not toxic)
33
 
34
+ ## Usage
35
 
36
+ ```python
37
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
38
 
39
+ tokenizer = AutoTokenizer.from_pretrained("dougtrajano/toxic-comment-classification")
40
 
41
+ model = AutoModelForSequenceClassification.from_pretrained("dougtrajano/toxic-comment-classification")
42
+ ```
43
+
44
+ ## Limitations and bias
45
+
46
+ The following factors may degrade the model’s performance.
47
+
48
+ **Text Language**: The model was trained on Brazilian Portuguese texts, so it may not work well with Portuguese dialects.
49
+
50
+ **Text Origin**: The model was trained on texts from social media and a few texts from other sources, so it may not work well on other types of texts.
51
+
52
+ ## Trade-offs
53
+
54
+ Sometimes models exhibit performance issues under particular circumstances. In this section, we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.
55
+
56
+ **Text Length**: The model was fine-tuned on texts with a word count between 1 and 178 words (average of 18 words). It may give poor results on texts with a word count outside this range.
57
+
58
+ ## Performance
59
+
60
+ The model was evaluated on the test set of the [OLID-BR](https://dougtrajano.github.io/olid-br/) dataset.
61
+
62
+ **Accuracy:** 0.8578
63
+
64
+ **Precision:** 0.8594
65
+
66
+ **Recall:** 0.8578
67
+
68
+ **F1-Score:** 0.8580
69
+
70
+ | Class | Precision | Recall | F1-Score | Support |
71
+ | :---: | :-------: | :----: | :------: | :-----: |
72
+ | `NOT-OFFENSIVE` | 0.8886 | 0.8490 | 0.8683 | 1,775 |
73
+ | `OFFENSIVE` | 0.8233 | 0.8686 | 0.8453 | 1,438 |
74
 
75
  ## Training procedure
76
 
77
  ### Training hyperparameters
78
 
79
  The following hyperparameters were used during training:
80
+
81
  - learning_rate: 3.255788747459486e-05
82
  - train_batch_size: 8
83
  - eval_batch_size: 8
 
87
  - num_epochs: 30
88
  - label_smoothing_factor: 0.07158711257743958
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ### Framework versions
91
 
92
  - Transformers 4.26.0
93
  - Pytorch 1.10.2+cu113
94
  - Datasets 2.9.0
95
  - Tokenizers 0.13.2
96
+
97
+ ## Provide Feedback
98
+
99
+ If you have any feedback on this model, please [open an issue](https://github.com/DougTrajano/ToChiquinho/issues/new) on GitHub.