LaProfeClaudis commited on
Commit
59da638
verified
1 Parent(s): 67a3ea2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -29
README.md CHANGED
@@ -1,27 +1,26 @@
1
- ---
2
- language:
3
- - es
4
- ---
5
 
6
- # Model Card for Model ID
7
-
8
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
 
 
9
 
10
  ## Model Details
11
 
12
- ### Model Description
 
 
 
 
 
13
 
14
- 1. The process begins by recovering discriminatory phrases for the LGBTQIA+ community from Twitter, Instagram and Tiktok (197 phrases) available at
15
- [https://zenodo.org/records/13756092]
16
- 2. Then preprocess them, label them by 3 evaluator raters como lgbtfobic (1) and non-lgbtfobic (0)
17
- 3. Augmentation was necessary applying backtranslation and synonyms (1200 phrases) available at .
18
- 4. Then adjust the BETO model based on BERT (dccuchile/bert-base-spanish-wwm-uncased) for discriminatory phrase detection for the LGBT community.
19
- 5. In this way the lgbetO v1.0.0 model was generated.
20
 
21
  - **Developed by:** [Mart铆nez-Araneda, C; Segura Navarrete, A.; Gutierrez Valenzuela, Mariella; Maldonado Mintiel, Diego; G贸mez Meneses, P.; Vidal-Castro; Christian ]
22
  - **Model type:** [text-classification]
23
  - **Language(s) (NLP):** [Spanish]
24
- - **License:** [cc-by-4.0]
25
  - **Finetuned from model [dccuchile/bert-base-spanish-wwm-uncased]:** More information of base model [https://github.com/dccuchile/beto]
26
 
27
  ### Model Sources [optional]
@@ -55,13 +54,9 @@ This model has its own bias from having been adjusted with a small data set.<!--
55
  [More Information Needed]
56
 
57
  ### Recommendations
58
-
59
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
60
-
61
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
62
-
63
- ## How to Get Started with the Model
64
- #lybraries
65
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
66
  # Define la ruta de donde cargar谩s el modelo
67
  #load_directory = "./lgbetO"
@@ -69,7 +64,7 @@ from transformers import AutoModelForSequenceClassification, AutoTokenizer
69
  #model = AutoModelForSequenceClassification.from_pretrained(load_directory)
70
  # Cargar el tokenizer
71
  #tokenizer = AutoTokenizer.from_pretrained(load_directory)
72
- ## Training Details
73
 
74
  The training process begins by retrieving offensive/non-offensive and discriminatory/non-discriminatory language against phrases related to the lgbt community from twitter, instagram and tiktok,
75
  preprocessing them, labeling them by 3 raters, augmenting them with backtranslation and synonyms,
@@ -77,10 +72,11 @@ and adjusting the BETO base model (dccuchile/bert-base -spanish-wwm-uncased) for
77
 
78
  ### Training Data
79
 
80
- https://zenodo.org/records/13756092
 
 
81
 
82
  ### Training Procedure
83
-
84
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
85
 
86
  #### Preprocessing [optional]
@@ -194,8 +190,4 @@ Disco: 33.96 GB/112.64 GB
194
 
195
  ## Model Card Authors [optional]
196
 
197
- [More Information Needed]
198
-
199
- ## Model Card Contact
200
-
201
- [cmartinez@ucsc.cl]
 
 
 
 
 
1
 
2
+ # Model Card for Model ID (in progress of completing)
3
+ This model is a fine-tunning of BETO uncase to detect offensive and discriminatory language against lgbt community.
4
+ It could be used as a moderation service in forums and digital spaces.
5
+ ## Model Card Contact
6
+ [cmartinez@ucsc.cl]
7
 
8
  ## Model Details
9
 
10
+ ### Model description process
11
+ -Starting recovering of discriminatory phrases for the LGBTQIA+ community from X/Twitter, Instagram and Tiktok (197 phrases) .
12
+ -Labelling by 3 raters as non-lgbtphobic (0) and lgbtphobic (1).
13
+ -Text augmentation was applied backtranslation and random synonyms replacing.
14
+ -Translating to Spanish part of McGiff, J., & Nikolov, N. S. (2024) dataset and added (under 聽licence CC-BY-4.0)
15
+ -Finally, we obtained 1234 tagged phrases for version 1.0.1 of LGBTQIAphobia_augmented. Please cite data set as:
16
 
17
+ Mart铆nez-Araneda, C., Maldonado Montiel, D., Guti茅rrez Valenzuela, M., G贸mez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024).
18
+ LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166
 
 
 
 
19
 
20
  - **Developed by:** [Mart铆nez-Araneda, C; Segura Navarrete, A.; Gutierrez Valenzuela, Mariella; Maldonado Mintiel, Diego; G贸mez Meneses, P.; Vidal-Castro; Christian ]
21
  - **Model type:** [text-classification]
22
  - **Language(s) (NLP):** [Spanish]
23
+ - **License:** [CC-BY-4.0]
24
  - **Finetuned from model [dccuchile/bert-base-spanish-wwm-uncased]:** More information of base model [https://github.com/dccuchile/beto]
25
 
26
  ### Model Sources [optional]
 
54
  [More Information Needed]
55
 
56
  ### Recommendations
 
57
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
58
+ # How to Get Started with the Model
59
+ #libraries
 
 
 
60
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
61
  # Define la ruta de donde cargar谩s el modelo
62
  #load_directory = "./lgbetO"
 
64
  #model = AutoModelForSequenceClassification.from_pretrained(load_directory)
65
  # Cargar el tokenizer
66
  #tokenizer = AutoTokenizer.from_pretrained(load_directory)
67
+ # Training Details
68
 
69
  The training process begins by retrieving offensive/non-offensive and discriminatory/non-discriminatory language against phrases related to the lgbt community from twitter, instagram and tiktok,
70
  preprocessing them, labeling them by 3 raters, augmenting them with backtranslation and synonyms,
 
72
 
73
  ### Training Data
74
 
75
+ Citation
76
+ Mart铆nez-Araneda, C., Maldonado Montiel, D., Guti茅rrez Valenzuela, M., G贸mez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024).
77
+ LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166
78
 
79
  ### Training Procedure
 
80
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
81
 
82
  #### Preprocessing [optional]
 
190
 
191
  ## Model Card Authors [optional]
192
 
193
+ [More Information Needed]