LaProfeClaudis
/

lgbeTo

Model card Files Files and versions Community

LaProfeClaudis commited on Dec 27, 2024

Commit

59da638

verified ·

1 Parent(s): 67a3ea2

Update README.md

Browse files

Files changed (1) hide show

README.md +21 -29

README.md CHANGED Viewed

@@ -1,27 +1,26 @@
----
-language:
-- es
----
-# Model Card for Model ID
-This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
 ## Model Details
-### Model Description
-1. The process begins by recovering discriminatory phrases for the LGBTQIA+ community from Twitter, Instagram and Tiktok (197 phrases) available at
-[https://zenodo.org/records/13756092]
-2. Then preprocess them, label them by 3 evaluator raters como lgbtfobic (1) and non-lgbtfobic (0)
-3. Augmentation was necessary applying backtranslation and synonyms (1200 phrases) available at .
-4. Then adjust the BETO model based on BERT (dccuchile/bert-base-spanish-wwm-uncased) for discriminatory phrase detection for the LGBT community.
-5. In this way the lgbetO v1.0.0 model was generated.
 - **Developed by:** [Martínez-Araneda, C; Segura Navarrete, A.; Gutierrez Valenzuela, Mariella; Maldonado Mintiel, Diego; Gómez Meneses, P.; Vidal-Castro; Christian  ]
 - **Model type:** [text-classification]
 - **Language(s) (NLP):** [Spanish]
-- **License:** [cc-by-4.0]
 - **Finetuned from model [dccuchile/bert-base-spanish-wwm-uncased]:** More information of base model [https://github.com/dccuchile/beto]
 ### Model Sources [optional]
@@ -55,13 +54,9 @@ This model has its own bias from having been adjusted with a small data set.<!--
 [More Information Needed]
 ### Recommendations
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-#lybraries
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
 # Define la ruta de donde cargarás el modelo
 #load_directory = "./lgbetO"
@@ -69,7 +64,7 @@ from transformers import AutoModelForSequenceClassification, AutoTokenizer
 #model = AutoModelForSequenceClassification.from_pretrained(load_directory)
 # Cargar el tokenizer
 #tokenizer = AutoTokenizer.from_pretrained(load_directory)
-## Training Details
 The training process begins by retrieving offensive/non-offensive and discriminatory/non-discriminatory language against phrases related to the lgbt community from twitter, instagram and tiktok,
 preprocessing them, labeling them by 3 raters, augmenting them with backtranslation and synonyms,
@@ -77,10 +72,11 @@ and adjusting the BETO base model (dccuchile/bert-base -spanish-wwm-uncased) for
 ### Training Data
-https://zenodo.org/records/13756092
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
@@ -194,8 +190,4 @@ Disco: 33.96 GB/112.64 GB
 ## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[cmartinez@ucsc.cl]

+# Model Card for Model ID (in progress of completing)
+This model is a fine-tunning of BETO uncase to detect offensive and discriminatory language against lgbt community.
+It could be used as a moderation service in forums and digital spaces.
+## Model Card Contact
+[cmartinez@ucsc.cl]
 ## Model Details
+### Model description process
+-Starting recovering of discriminatory phrases for the LGBTQIA+ community from X/Twitter, Instagram and Tiktok (197 phrases) .
+-Labelling by 3 raters as non-lgbtphobic (0) and lgbtphobic (1).
+-Text augmentation was applied backtranslation and random synonyms replacing.
+-Translating to Spanish part of McGiff, J., & Nikolov, N. S. (2024) dataset and added (under  licence CC-BY-4.0)
+-Finally, we obtained 1234 tagged phrases for version 1.0.1 of LGBTQIAphobia_augmented. Please cite data set as:
+Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024).
+LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166
 - **Developed by:** [Martínez-Araneda, C; Segura Navarrete, A.; Gutierrez Valenzuela, Mariella; Maldonado Mintiel, Diego; Gómez Meneses, P.; Vidal-Castro; Christian  ]
 - **Model type:** [text-classification]
 - **Language(s) (NLP):** [Spanish]
+- **License:** [CC-BY-4.0]
 - **Finetuned from model [dccuchile/bert-base-spanish-wwm-uncased]:** More information of base model [https://github.com/dccuchile/beto]
 ### Model Sources [optional]
 [More Information Needed]
 ### Recommendations
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+# How to Get Started with the Model
+#libraries
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
 # Define la ruta de donde cargarás el modelo
 #load_directory = "./lgbetO"
 #model = AutoModelForSequenceClassification.from_pretrained(load_directory)
 # Cargar el tokenizer
 #tokenizer = AutoTokenizer.from_pretrained(load_directory)
+# Training Details
 The training process begins by retrieving offensive/non-offensive and discriminatory/non-discriminatory language against phrases related to the lgbt community from twitter, instagram and tiktok,
 preprocessing them, labeling them by 3 raters, augmenting them with backtranslation and synonyms,
 ### Training Data
+Citation
+Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024).
+LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
 ## Model Card Authors [optional]
+[More Information Needed]