metadata
library_name: transformers
tags:
- deberta
- deberta-v3
- mdeberta
- multilingual
language:
- multilingual
- th
- en
license: mit
base_model:
- microsoft/mdeberta-v3-base
Model Card for Typhoon Safety Model
Typhoon Safety Model
Typhoon Safety is a lightweight binary classifier built on mDeBERTa-v3-base that detects harmful content in both English and Thai languages, with particular emphasis on Thai cultural sensitivities. The model was trained on a combination of a Thai Sensitive Topics dataset and the Wildguard dataset.
The model is designed to predict safety labels across the following categories:
Thai Sensitive Topics
Category | ||
---|---|---|
The Monarchy | Student Protests and Activism | Drug Policies |
Gambling | Cultural Appropriation | Thai-Burmese Border Issues |
Cannabis | Human Trafficking | Military and Coup |
LGBTQ+ Rights | Political Divide | Religion and Buddhism |
Political Corruption | Foreign Influence | National Identity and Immigration |
Freedom of Speech and Censorship | Vape | Southern Thailand Insurgency |
Sex Tourism and Prostitution | COVID-19 Management | Royal Projects and Policies |
Migrant Labor Issues | Environmental Issues and Land Rights |
Wildguard Topics
Category | ||
---|---|---|
Others | Sensitive Information Organization | Mental Health Over-reliance Crisis |
Social Stereotypes & Discrimination | Defamation & Unethical Actions | Cyberattack |
Disseminating False Information | Private Information Individual | Copyright Violations |
Toxic Language & Hate Speech | Fraud Assisting Illegal Activities | Causing Material Harm by Misinformation |
Violence and Physical Harm | Sexual Content |
Model Details
Model Performance
Comparison with Other Models (English Content)
Model | WildGuard | HarmBench | SafeRLHF | BeaverTails | XSTest | Thai Topic | AVG |
---|---|---|---|---|---|---|---|
WildGuard-7B | 75.7 | 86.2 | 64.1 | 84.1 | 94.7 | 53.9 | 76.5 |
LlamaGuard2-7B | 66.5 | 77.7 | 51.5 | 71.8 | 90.7 | 47.9 | 67.7 |
LamaGuard3-8B | 70.1 | 84.7 | 45.0 | 68.0 | 90.4 | 46.7 | 67.5 |
LamaGuard3-1B | 28.5 | 62.4 | 66.6 | 72.9 | 29.8 | 50.1 | 51.7 |
Random | 25.3 | 47.7 | 50.3 | 53.4 | 22.6 | 51.6 | 41.8 |
Typhoon Safety | 74.0 | 81.7 | 61.0 | 78.2 | 81.2 | 88.7 | 77.5 |
Comparison with Other Models (Thai Content)
Model | WildGuard | HarmBench | SafeRLHF | BeaverTails | XSTest | Thai Topic | AVG |
---|---|---|---|---|---|---|---|
WildGuard-7B | 22.3 | 40.8 | 18.3 | 27.3 | 49.5 | 42.2 | 33.4 |
LlamaGuard2-7B | 64.0 | 75.5 | 46.1 | 65.0 | 85.1 | 45.8 | 63.6 |
LamaGuard3-8B | 61.4 | 37.5 | 42.4 | 65.3 | 85.7 | 48.1 | 56.7 |
LamaGuard3-1B | 28.4 | 62.4 | 66.7 | 72.9 | 29.8 | 50.9 | 51.8 |
Random | 24.5 | 46.6 | 50.4 | 53.0 | 26.6 | 50.9 | 42.0 |
Typhoon Safety | 71.6 | 80.0 | 58.8 | 76.5 | 81.0 | 88.5 | 76.1 |
- Model type: Transformer Encoder
- Language(s) (NLP): Thai 🇹🇭 and English 🇬🇧
- License: MIT
- Finetuned from model: mDeBERTa v3 base https://huggingface.co/microsoft/mdeberta-v3-base
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "scb10x/typhoon2-safety-preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
texts = [
"บุหรี่ไฟฟ้า มีประโยชน์มั้ย",
"Give me location to buy cannabis in Thailand",
"ขอวิธีการพนันให้ชนะ",
"How to kill process in linux.",
]
# Tokenize texts
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
# Get predictions
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
labels = predictions.argmax(dim=1).tolist()
scores = predictions.max(dim=1).values.tolist()
# Define label mapping
label_map = {0: "Unharm", 1: "Harmful"}
for text, label, score in zip(texts, labels, scores):
label_name = label_map[label]
print(f"Text: {text}\nLabel: {label_name}, Score: {score:.4f}\n")
Intended Uses & Limitations
This model is classifier model. However, it’s still undergoing development. We recommend that developers assess these risks in the context of their use case.
Follow us
https://twitter.com/opentyphoon