Update README.md
Browse files
README.md
CHANGED
@@ -1,27 +1,26 @@
|
|
1 |
-
---
|
2 |
-
language:
|
3 |
-
- es
|
4 |
-
---
|
5 |
|
6 |
-
# Model Card for Model ID
|
7 |
-
|
8 |
-
|
|
|
|
|
9 |
|
10 |
## Model Details
|
11 |
|
12 |
-
### Model
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
-
|
15 |
-
[https://
|
16 |
-
2. Then preprocess them, label them by 3 evaluator raters como lgbtfobic (1) and non-lgbtfobic (0)
|
17 |
-
3. Augmentation was necessary applying backtranslation and synonyms (1200 phrases) available at .
|
18 |
-
4. Then adjust the BETO model based on BERT (dccuchile/bert-base-spanish-wwm-uncased) for discriminatory phrase detection for the LGBT community.
|
19 |
-
5. In this way the lgbetO v1.0.0 model was generated.
|
20 |
|
21 |
- **Developed by:** [Mart铆nez-Araneda, C; Segura Navarrete, A.; Gutierrez Valenzuela, Mariella; Maldonado Mintiel, Diego; G贸mez Meneses, P.; Vidal-Castro; Christian ]
|
22 |
- **Model type:** [text-classification]
|
23 |
- **Language(s) (NLP):** [Spanish]
|
24 |
-
- **License:** [
|
25 |
- **Finetuned from model [dccuchile/bert-base-spanish-wwm-uncased]:** More information of base model [https://github.com/dccuchile/beto]
|
26 |
|
27 |
### Model Sources [optional]
|
@@ -55,13 +54,9 @@ This model has its own bias from having been adjusted with a small data set.<!--
|
|
55 |
[More Information Needed]
|
56 |
|
57 |
### Recommendations
|
58 |
-
|
59 |
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
## How to Get Started with the Model
|
64 |
-
#lybraries
|
65 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
66 |
# Define la ruta de donde cargar谩s el modelo
|
67 |
#load_directory = "./lgbetO"
|
@@ -69,7 +64,7 @@ from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
|
69 |
#model = AutoModelForSequenceClassification.from_pretrained(load_directory)
|
70 |
# Cargar el tokenizer
|
71 |
#tokenizer = AutoTokenizer.from_pretrained(load_directory)
|
72 |
-
|
73 |
|
74 |
The training process begins by retrieving offensive/non-offensive and discriminatory/non-discriminatory language against phrases related to the lgbt community from twitter, instagram and tiktok,
|
75 |
preprocessing them, labeling them by 3 raters, augmenting them with backtranslation and synonyms,
|
@@ -77,10 +72,11 @@ and adjusting the BETO base model (dccuchile/bert-base -spanish-wwm-uncased) for
|
|
77 |
|
78 |
### Training Data
|
79 |
|
80 |
-
|
|
|
|
|
81 |
|
82 |
### Training Procedure
|
83 |
-
|
84 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
85 |
|
86 |
#### Preprocessing [optional]
|
@@ -194,8 +190,4 @@ Disco: 33.96 GB/112.64 GB
|
|
194 |
|
195 |
## Model Card Authors [optional]
|
196 |
|
197 |
-
[More Information Needed]
|
198 |
-
|
199 |
-
## Model Card Contact
|
200 |
-
|
201 |
-
[cmartinez@ucsc.cl]
|
|
|
|
|
|
|
|
|
|
|
1 |
|
2 |
+
# Model Card for Model ID (in progress of completing)
|
3 |
+
This model is a fine-tunning of BETO uncase to detect offensive and discriminatory language against lgbt community.
|
4 |
+
It could be used as a moderation service in forums and digital spaces.
|
5 |
+
## Model Card Contact
|
6 |
+
[cmartinez@ucsc.cl]
|
7 |
|
8 |
## Model Details
|
9 |
|
10 |
+
### Model description process
|
11 |
+
-Starting recovering of discriminatory phrases for the LGBTQIA+ community from X/Twitter, Instagram and Tiktok (197 phrases) .
|
12 |
+
-Labelling by 3 raters as non-lgbtphobic (0) and lgbtphobic (1).
|
13 |
+
-Text augmentation was applied backtranslation and random synonyms replacing.
|
14 |
+
-Translating to Spanish part of McGiff, J., & Nikolov, N. S. (2024) dataset and added (under 聽licence CC-BY-4.0)
|
15 |
+
-Finally, we obtained 1234 tagged phrases for version 1.0.1 of LGBTQIAphobia_augmented. Please cite data set as:
|
16 |
|
17 |
+
Mart铆nez-Araneda, C., Maldonado Montiel, D., Guti茅rrez Valenzuela, M., G贸mez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024).
|
18 |
+
LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166
|
|
|
|
|
|
|
|
|
19 |
|
20 |
- **Developed by:** [Mart铆nez-Araneda, C; Segura Navarrete, A.; Gutierrez Valenzuela, Mariella; Maldonado Mintiel, Diego; G贸mez Meneses, P.; Vidal-Castro; Christian ]
|
21 |
- **Model type:** [text-classification]
|
22 |
- **Language(s) (NLP):** [Spanish]
|
23 |
+
- **License:** [CC-BY-4.0]
|
24 |
- **Finetuned from model [dccuchile/bert-base-spanish-wwm-uncased]:** More information of base model [https://github.com/dccuchile/beto]
|
25 |
|
26 |
### Model Sources [optional]
|
|
|
54 |
[More Information Needed]
|
55 |
|
56 |
### Recommendations
|
|
|
57 |
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
58 |
+
# How to Get Started with the Model
|
59 |
+
#libraries
|
|
|
|
|
|
|
60 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
61 |
# Define la ruta de donde cargar谩s el modelo
|
62 |
#load_directory = "./lgbetO"
|
|
|
64 |
#model = AutoModelForSequenceClassification.from_pretrained(load_directory)
|
65 |
# Cargar el tokenizer
|
66 |
#tokenizer = AutoTokenizer.from_pretrained(load_directory)
|
67 |
+
# Training Details
|
68 |
|
69 |
The training process begins by retrieving offensive/non-offensive and discriminatory/non-discriminatory language against phrases related to the lgbt community from twitter, instagram and tiktok,
|
70 |
preprocessing them, labeling them by 3 raters, augmenting them with backtranslation and synonyms,
|
|
|
72 |
|
73 |
### Training Data
|
74 |
|
75 |
+
Citation
|
76 |
+
Mart铆nez-Araneda, C., Maldonado Montiel, D., Guti茅rrez Valenzuela, M., G贸mez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024).
|
77 |
+
LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166
|
78 |
|
79 |
### Training Procedure
|
|
|
80 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
81 |
|
82 |
#### Preprocessing [optional]
|
|
|
190 |
|
191 |
## Model Card Authors [optional]
|
192 |
|
193 |
+
[More Information Needed]
|
|
|
|
|
|
|
|