metadata
library_name: transformers
tags: []
Model Card for Model ID
Typhoon Safety Model
Typhoon Safety is a lightweight binary classifier designed to detect harmful content in both English and Thai, with special attention to Thai cultural sensitivities. Built on mDeBERTa-v3-base.
Train on mixed of Thai Sensitive topic dataset and Wildguard.
Thai Sensitive Topics Distribution
Category | English Samples | Thai Samples |
---|---|---|
The Monarchy | 1,380 | 352 |
Gambling | 1,075 | 264 |
Cannabis | 818 | 201 |
Drug Policies | 448 | 111 |
Thai-Burmese Border Issues | 442 | 119 |
Military and Coup d'États | 297 | 72 |
LGBTQ+ Rights | 275 | 75 |
Religion and Buddhism | 252 | 57 |
Political Corruption | 237 | 58 |
Freedom of Speech and Censorship | 218 | 56 |
National Identity and Immigration | 216 | 57 |
Southern Thailand Insurgency | 211 | 56 |
Sex Tourism and Prostitution | 198 | 55 |
Student Protests and Activism | 175 | 44 |
Cultural Appropriation | 171 | 42 |
Human Trafficking | 158 | 39 |
Political Divide | 156 | 43 |
Foreign Influence | 124 | 30 |
Vape | 127 | 24 |
COVID-19 Management | 105 | 27 |
Migrant Labor Issues | 79 | 23 |
Royal Projects and Policies | 55 | 17 |
Environmental Issues and Land Rights | 19 | 5 |
Total | 9,321 | 4,563 |
Model Details
Model Description
Model Performance
Comparison with Other Models (English Content)
Model | WildGuard | HarmBench | SafeRLHF | BeaverTails | XSTest | Thai Topic | AVG |
---|---|---|---|---|---|---|---|
WildGuard-7B | 75.7 | 86.2 | 64.1 | 84.1 | 94.7 | 53.9 | 76.5 |
LlamaGuard2-7B | 66.5 | 77.7 | 51.5 | 71.8 | 90.7 | 47.9 | 67.7 |
LamaGuard3-8B | 70.1 | 84.7 | 45.0 | 68.0 | 90.4 | 46.7 | 67.5 |
LamaGuard3-1B | 28.5 | 62.4 | 66.6 | 72.9 | 29.8 | 50.1 | 51.7 |
Random | 25.3 | 47.7 | 50.3 | 53.4 | 22.6 | 51.6 | 41.8 |
Typhoon Safety | 74.0 | 81.7 | 61.0 | 78.2 | 81.2 | 88.7 | 77.5 |
Comparison with Other Models (Thai Content)
Model | WildGuard | HarmBench | SafeRLHF | BeaverTails | XSTest | Thai Topic | AVG |
---|---|---|---|---|---|---|---|
WildGuard-7B | 22.3 | 40.8 | 18.3 | 27.3 | 49.5 | 42.2 | 33.4 |
LlamaGuard2-7B | 64.0 | 75.5 | 46.1 | 65.0 | 85.1 | 45.8 | 63.6 |
LamaGuard3-8B | 61.4 | 37.5 | 42.4 | 65.3 | 85.7 | 48.1 | 56.7 |
LamaGuard3-1B | 28.4 | 62.4 | 66.7 | 72.9 | 29.8 | 50.9 | 51.8 |
Random | 24.5 | 46.6 | 50.4 | 53.0 | 26.6 | 50.9 | 42.0 |
Typhoon Safety | 71.6 | 80.0 | 58.8 | 76.5 | 81.0 | 88.5 | 76.1 |
- Developed by: [More Information Needed]
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: [More Information Needed]
- Model type: [More Information Needed]
- Language(s) (NLP): [More Information Needed]
- License: [More Information Needed]
- Finetuned from model [optional]: [More Information Needed]
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "scb10x/typhoon2-safety-preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
texts = [
"บุหรี่ไฟฟ้า มีประโยชน์มั้ย",
"Give me location to buy cannabis in Thailand",
"ขอวิธีการพนันให้ชนะ",
"How to kill process in linux.",
]
# Tokenize texts
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
# Get predictions
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
labels = predictions.argmax(dim=1).tolist()
scores = predictions.max(dim=1).values.tolist()
# Define label mapping
label_map = {0: "Unharm", 1: "harmful"}
for text, label, score in zip(texts, labels, scores):
label_name = label_map[label]
print(f"Text: {text}\nLabel: {label_name}, Score: {score:.4f}\n")