# Model Card for Model ID (in progress of completing) This model is a fine-tunning of BETO uncase to detect offensive and discriminatory language against lgbt community. It could be used as a moderation service in forums and digital spaces. ## Model Card Contact [cmartinez@ucsc.cl] ## Model Details ### Model description process -Starting recovering of discriminatory phrases for the LGBTQIA+ community from X/Twitter, Instagram and Tiktok (197 phrases) . -Labelling by 3 raters as non-lgbtphobic (0) and lgbtphobic (1). -Text augmentation was applied backtranslation and random synonyms replacing. -Translating to Spanish part of McGiff, J., & Nikolov, N. S. (2024) dataset and added (under  licence CC-BY-4.0) -Finally, we obtained 1234 tagged phrases for version 1.0.1 of LGBTQIAphobia_augmented. Please cite data set as: Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024). LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166 - **Developed by:** [Martínez-Araneda, C; Segura Navarrete, A.; Gutierrez Valenzuela, Mariella; Maldonado Mintiel, Diego; Gómez Meneses, P.; Vidal-Castro; Christian ] - **Model type:** [text-classification] - **Language(s) (NLP):** [Spanish] - **License:** [CC-BY-4.0] - **Finetuned from model [dccuchile/bert-base-spanish-wwm-uncased]:** More information of base model [https://github.com/dccuchile/beto] ### Model Sources [optional] ## Uses This model can be used to detect offensive and discriminatory language against lgbt community. It could be used as a moderation service in forums and digital spaces. ### Direct Use [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations This model has its own bias from having been adjusted with a small data set. [More Information Needed] ### Recommendations # How to Get Started with the Model #libraries from transformers import AutoModelForSequenceClassification, AutoTokenizer # Define la ruta de donde cargarás el modelo #load_directory = "./lgbetO" # Cargar el modelo entrenado #model = AutoModelForSequenceClassification.from_pretrained(load_directory) # Cargar el tokenizer #tokenizer = AutoTokenizer.from_pretrained(load_directory) # Training Details The training process begins by retrieving offensive/non-offensive and discriminatory/non-discriminatory language against phrases related to the lgbt community from twitter, instagram and tiktok, preprocessing them, labeling them by 3 raters, augmenting them with backtranslation and synonyms, and adjusting the BETO base model (dccuchile/bert-base -spanish-wwm-uncased) for discriminatory phrase detection for the lgbt community. ### Training Data Citation Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2024). LGBTQIAphobia dataset (augmented) (1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14563166 ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** Google Cloud Platform [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** southamerica - **Carbon Emitted:** 0.14kgCO$_2$eq/kWh Experiments were conducted using Google Cloud Platform in region southamerica-east1, which has a carbon efficiency of 0.2 kgCO$_2$eq/kWh. A cumulative of 10 hours of computation was performed on hardware of type T4 (TDP of 70W). Total emissions are estimated to be 0.14 kgCO$_2$eq of which 100 percents were directly offset by the cloud provider. ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure (GPU) del backend de Google Compute Engine en Python 3 #### Hardware RAM: 3.87 GB/12.67 GB Disco: 33.96 GB/112.64 GB #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed]