--- language: - en pipeline_tag: text-classification tags: - hate - hate_speech --- Open In Colab Social Network Hate Detection: Finding Social Media Posts Containing Hateful Information Using Ensemble Methods and Back-Translation Recent research efforts have been directed toward the development of automated systems for detecting hateful content to assist social media providers in identifying and removing such content before it can be viewed by the public. This paper introduces a unique ensemble approach that utilizes DeBERTa models, which benefits from pre-training on massive synthetic data and the integration of back-translation techniques during training and testing. Our findings reveal that this approach delivers state-of-the-art results in hate-speech detection. The results demonstrate that the combination of back-translation, ensemble, and test-time augmentation results in a considerable improvement across various metrics and models in both the Parler and GAB datasets. We show that our method reduces models’ bias in an effective and meaningful way, and also reduces the RMSE from 0.838 to around 0.766 and increases R-squared from 0.520 to 0.599. The biggest improvement was seen in small Deberate models, while for large models, there was either a minor improvement or no change. ## Results ``` !pip install huggingface_hub !pip install tokenizers transformers !pip install iterative-stratification !git clone https://github.com/OrKatz7/parler-hate-speech %cd parler-hate-speech/src ``` ``` from huggingface_hub import hf_hub_download import torch import sys from model import CustomModel,MeanPooling from transformers import AutoTokenizer, AutoModel, AutoConfig import numpy as np class CFG: model="microsoft/deberta-v3-base" target_cols=['label_mean'] ``` ``` name = "OrK7/parler_hate_speech" downloaded_model_path = hf_hub_download(repo_id=name, filename="pytorch_model.bin") model = torch.load(downloaded_model_path) tokenizer = AutoTokenizer.from_pretrained(name) ``` ``` def prepare_input(text): inputs = tokenizer.encode_plus( text, return_tensors=None, add_special_tokens=True, max_length=512, pad_to_max_length=True, truncation=True ) for k, v in inputs.items(): inputs[k] = torch.tensor(np.array(v).reshape(1,-1), dtype=torch.long) return inputs def collate(inputs): mask_len = int(inputs["attention_mask"].sum(axis=1).max()) for k, v in inputs.items(): inputs[k] = inputs[k][:,:mask_len] return inputs ``` ``` from transformers import Pipeline class HatePipeline(Pipeline): def _sanitize_parameters(self, **kwargs): preprocess_kwargs = {} if "maybe_arg" in kwargs: preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"] return preprocess_kwargs, {}, {} def preprocess(self, inputs): out = prepare_input(inputs) return collate(out) def _forward(self, model_inputs): outputs = self.model(model_inputs) return outputs def postprocess(self, model_outputs): return np.array(model_outputs[0,0].numpy()).clip(0,1)*4+1 ``` ``` pipe = HatePipeline(model=model) pipe("I Love you #") ``` results: 1.0 ``` pipe("I Hate #$%#$%Jewish%$#@%^^@#") ``` results: 4.155200004577637