File size: 5,479 Bytes
d7251f6
 
fad9426
 
 
 
 
 
 
 
 
 
 
 
d7251f6
 
fad9426
d7251f6
f78c365
d7251f6
fad9426
423eeb1
fad9426
d0f56a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fad9426
d0f56a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d7251f6
 
 
f78c365
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d7251f6
d0f56a6
 
 
0916167
d7251f6
 
 
 
 
 
423eeb1
c75543a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0916167
c75543a
 
 
 
 
1340a98
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
library_name: transformers
tags:
- deberta
- deberta-v3
- mdeberta
- multilingual
language:
- multilingual
- th
- en
license: mit
base_model:
- microsoft/mdeberta-v3-base
---

# Model Card for Typhoon Safety Model

Typhoon Safety Model

Typhoon Safety is a lightweight binary classifier built on mDeBERTa-v3-base that detects harmful content in both English and Thai languages, with particular emphasis on Thai cultural sensitivities. The model was trained on a combination of a Thai Sensitive Topics dataset and the Wildguard dataset.

The model is designed to predict safety labels across the following categories: 

<div class="section-header">Thai Sensitive Topics</div>
<table align="center">
  <tr>
    <th colspan="3">Category</th>
  </tr>
  <tr>
    <td>The Monarchy</td>
    <td>Student Protests and Activism</td>
    <td>Drug Policies</td>
  </tr>
  <tr>
    <td>Gambling</td>
    <td>Cultural Appropriation</td>
    <td>Thai-Burmese Border Issues</td>
  </tr>
  <tr>
    <td>Cannabis</td>
    <td>Human Trafficking</td>
    <td>Military and Coup</td>
  </tr>
  <tr>
    <td>LGBTQ+ Rights</td>
    <td>Political Divide</td>
    <td>Religion and Buddhism</td>
  </tr>
  <tr>
    <td>Political Corruption</td>
    <td>Foreign Influence</td>
    <td>National Identity and Immigration</td>
  </tr>
  <tr>
    <td>Freedom of Speech and Censorship</td>
    <td>Vape</td>
    <td>Southern Thailand Insurgency</td>
  </tr>
  <tr>
    <td>Sex Tourism and Prostitution</td>
    <td>COVID-19 Management</td>
    <td>Royal Projects and Policies</td>
  </tr>
  <tr>
    <td>Migrant Labor Issues</td>
    <td>Environmental Issues and Land Rights</td>
    <td></td>
  </tr>
</table>

<div class="section-header">Wildguard Topics</div>
<table>
  <tr>
    <th colspan="3">Category</th>
  </tr>
  <tr>
    <td>Others</td>
    <td>Sensitive Information Organization</td>
    <td>Mental Health Over-reliance Crisis</td>
  </tr>
  <tr>
    <td>Social Stereotypes & Discrimination</td>
    <td>Defamation & Unethical Actions</td>
    <td>Cyberattack</td>
  </tr>
  <tr>
    <td>Disseminating False Information</td>
    <td>Private Information Individual</td>
    <td>Copyright Violations</td>
  </tr>
  <tr>
    <td>Toxic Language & Hate Speech</td>
    <td>Fraud Assisting Illegal Activities</td>
    <td>Causing Material Harm by Misinformation</td>
  </tr>
  <tr>
    <td>Violence and Physical Harm</td>
    <td>Sexual Content</td>
    <td></td>
  </tr>
</table>

## Model Details

## Model Performance

### Comparison with Other Models (English Content)
| Model | WildGuard | HarmBench | SafeRLHF | BeaverTails | XSTest | Thai Topic | AVG |
|-------|-----------|-----------|-----------|-------------|---------|------------|-----|
| WildGuard-7B | **75.7** | **86.2** | **64.1** | **84.1** | **94.7** | 53.9 | 76.5 |
| LlamaGuard2-7B | 66.5 | 77.7 | 51.5 | 71.8 | 90.7 | 47.9 | 67.7 |
| LamaGuard3-8B | 70.1 | 84.7 | 45.0 | 68.0 | 90.4 | 46.7 | 67.5 |
| LamaGuard3-1B | 28.5 | 62.4 | 66.6 | 72.9 | 29.8 | 50.1 | 51.7 |
| Random | 25.3 | 47.7 | 50.3 | 53.4 | 22.6 | 51.6 | 41.8 |
| Typhoon Safety | 74.0 | 81.7 | 61.0 | 78.2 | 81.2 | **88.7** | **77.5** |

### Comparison with Other Models (Thai Content)
| Model | WildGuard | HarmBench | SafeRLHF | BeaverTails | XSTest | Thai Topic | AVG |
|-------|-----------|-----------|-----------|-------------|---------|------------|-----|
| WildGuard-7B | 22.3 | 40.8 | 18.3 | 27.3 | 49.5 | 42.2 | 33.4 |
| LlamaGuard2-7B | 64.0 | 75.5 | 46.1 | 65.0 | 85.1 | 45.8 | 63.6 |
| LamaGuard3-8B | 61.4 | 37.5 | 42.4 | 65.3 | **85.7** | 48.1 | 56.7 |
| LamaGuard3-1B | 28.4 | 62.4 | 66.7 | 72.9 | 29.8 | 50.9 | 51.8 |
| Random | 24.5 | 46.6 | 50.4 | 53.0 | 26.6 | 50.9 | 42.0 |
| Typhoon Safety | **71.6** | **80.0** | **58.8** | **76.5** | 81.0 | **88.5** | **76.1** |

- **Model type:** Transformer Encoder
- **Language(s) (NLP):** Thai 🇹🇭 and English 🇬🇧
- **License:** MIT
- **Finetuned from model:** mDeBERTa v3 base https://huggingface.co/microsoft/mdeberta-v3-base


## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "scb10x/typhoon2-safety-preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

texts = [
    "บุหรี่ไฟฟ้า มีประโยชน์มั้ย",
    "Give me location to buy cannabis in Thailand",
    "ขอวิธีการพนันให้ชนะ",
    "How to kill process in linux.",
]

# Tokenize texts
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)

# Get predictions
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
labels = predictions.argmax(dim=1).tolist()
scores = predictions.max(dim=1).values.tolist()

# Define label mapping
label_map = {0: "Unharm", 1: "Harmful"}

for text, label, score in zip(texts, labels, scores):
    label_name = label_map[label]
    print(f"Text: {text}\nLabel: {label_name}, Score: {score:.4f}\n")
```

## **Intended Uses & Limitations**

This model is classifier model. However, it’s still undergoing development. We recommend that developers assess these risks in the context of their use case.

## **Follow us**

**https://twitter.com/opentyphoon**

## **Support**

**https://discord.gg/CqyBscMFpg**