Hate-speech-CNERG
/

deoffxlmr-mono-tamil

Text Classification

Inference Endpoints

Model card Files Files and versions Community

Punyajoy commited on Sep 25, 2021

Commit

4b04b56

•

1 Parent(s): 31a1805

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -1,3 +1,7 @@
 This model is used to detect **Offensive Content** in **Tamil Code-Mixed language**. The mono in the name refers to the monolingual setting, where the model is trained using only Tamil(pure and code-mixed) data. The weights are initialized from pretrained XLM-Roberta-Base and pretrained using Masked Language Modelling on the target dataset before fine-tuning using Cross-Entropy Loss.
 This model is the best of multiple trained for **EACL 2021 Shared Task on Offensive Language Identification in Dravidian Languages**. Genetic-Algorithm based ensembled test predictions got the highest weighted F1 score at the leaderboard (Weighted F1 score on hold out test set: This model - 0.76, Ensemble - 0.78)

+---
+language: ta
+license: apache-2.0
+---
 This model is used to detect **Offensive Content** in **Tamil Code-Mixed language**. The mono in the name refers to the monolingual setting, where the model is trained using only Tamil(pure and code-mixed) data. The weights are initialized from pretrained XLM-Roberta-Base and pretrained using Masked Language Modelling on the target dataset before fine-tuning using Cross-Entropy Loss.
 This model is the best of multiple trained for **EACL 2021 Shared Task on Offensive Language Identification in Dravidian Languages**. Genetic-Algorithm based ensembled test predictions got the highest weighted F1 score at the leaderboard (Weighted F1 score on hold out test set: This model - 0.76, Ensemble - 0.78)