File size: 3,960 Bytes
7bffaaf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import streamlit as st

title = "Welcome Page"
description = "Introduction"
date = "2022-01-26"
thumbnail = "images/waving_hand.png"

__INTRO_TEXT = """
Welcome to the Task Exploration Activity for hate speech detection!
In this series of modules, you'll learn about the history of hate speech detection as a task in
the larger pipeline of automatic content moderation (ACM).
You'll also be able to interact with and compare datasets and models built for this task.

The goal of this exploration is to share the design considerations and challenges faced when using algorithms to detect hate speech.
"""

__DEF_HATE_SPEECH = """
Hate speech is hard to define, with definitions shifting across time and location.
In 2019, the United Nations defined hate speech as "any kind of communication in speech,
writing or behaviour, that attacks or uses pejorative or discriminatory language with
reference to a person or a group on the basis of who they are, in other words, based on their religion,
ethnicity, nationality, race, colour, descent, gender or other identity factor."
"""

__DEF_CONTENT = """
Different platforms have different guidelines about what
content is sanctioned on the platform. For example, many US-based platforms prohibit posting threats of violence,
nudity, and hate speech. We discuss hate speech below.
"""

__CONTENT_WARNING = """
These modules contain examples of hateful, abusive, and offensive language that have be collected in datasets and 
reproduced by models. These examples are meant to illustrate the variety of content that may be subject to 
moderation.

"""

__DATASET_LIST = """
- [FRENK hate speech dataset](https://huggingface.co/datasets/classla/FRENK-hate-en)
- [Twitter Hate Speech dataset](https://huggingface.co/datasets/tweets_hate_speech_detection)
- [UC Berkley Measuring Hate Speech](https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech)
- [Dynamically Generated Hate Speech Dataset](https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset)
- [HateCheck](https://github.com/paul-rottger/hatecheck-data)
- [Hateful Memes Dataset](https://huggingface.co/datasets/limjiayi/hateful_memes_expanded)
- [Open Subtitles English Dataset](https://opus.nlpl.eu/OpenSubtitles-v2018.php)
"""

__MODEL_LIST = """
- [RoBERTa trained on FRENK dataset](https://huggingface.co/classla/roberta-base-frenk-hate)
- [RoBERTa trained on Twitter Hate Speech](https://huggingface.co/cardiffnlp/twitter-roberta-base-hate)
- [DeHateBERT model (trained on Twitter and StormFront)](https://huggingface.co/Hate-speech-CNERG/dehatebert-mono-english)
- [RoBERTa trained on 11 English hate speech datasets](https://huggingface.co/facebook/roberta-hate-speech-dynabench-r1-target)
- [RoBERTa trained on 11 English hate speech datasets and Round 1 of the Dynamically Generated Hate Speech Dataset](https://huggingface.co/facebook/roberta-hate-speech-dynabench-r2-target)
- [RoBERTa trained on 11 English hate speech datasets and Rounds 1 and 2 of the Dynamically Generated Hate Speech Dataset](https://huggingface.co/facebook/roberta-hate-speech-dynabench-r3-target)
- [RoBERTa trained on 11 English hate speech datasets and Rounds 1, 2, and 3 of the Dynamically Generated Hate Speech Dataset](https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target)
"""

def run_article():
    st.markdown("# Welcome!")
    st.markdown(__INTRO_TEXT)
    st.markdown("### What is hate speech?")
    st.markdown(__DEF_HATE_SPEECH)
    st.markdown("### What kind of content is subject to moderation?")
    st.markdown(__DEF_CONTENT)
    st.markdown("### Content Warning")
    st.markdown(__CONTENT_WARNING)
    st.markdown("---\n\n## Featured datasets and models")
    col_1, col_2, _ = st.columns(3)
    with col_1:
        st.markdown("### Datasets")
        st.markdown(__DATASET_LIST, unsafe_allow_html=True)
    with col_2:
        st.markdown("### Models")
        st.markdown(__MODEL_LIST, unsafe_allow_html=True)