metadata

library_name: transformers
tags:
  - deberta
  - deberta-v3
  - mdeberta
  - multilingual
language:
  - multilingual
  - th
  - en
license: mit
base_model:
  - microsoft/mdeberta-v3-base

Model Card for Typhoon Safety Model

Typhoon Safety Model

Typhoon Safety is a lightweight binary classifier built on mDeBERTa-v3-base that detects harmful content in both English and Thai languages, with particular emphasis on Thai cultural sensitivities. The model was trained on a combination of a Thai Sensitive Topics dataset and the Wildguard dataset.

The model is designed to predict safety labels across the following categories:

Thai Sensitive Topics

Category
The Monarchy	Student Protests and Activism	Drug Policies
Gambling	Cultural Appropriation	Thai-Burmese Border Issues
Cannabis	Human Trafficking	Military and Coup
LGBTQ+ Rights	Political Divide	Religion and Buddhism
Political Corruption	Foreign Influence	National Identity and Immigration
Freedom of Speech and Censorship	Vape	Southern Thailand Insurgency
Sex Tourism and Prostitution	COVID-19 Management	Royal Projects and Policies
Migrant Labor Issues	Environmental Issues and Land Rights

Wildguard Topics

Category
Others	Sensitive Information Organization	Mental Health Over-reliance Crisis
Social Stereotypes & Discrimination	Defamation & Unethical Actions	Cyberattack
Disseminating False Information	Private Information Individual	Copyright Violations
Toxic Language & Hate Speech	Fraud Assisting Illegal Activities	Causing Material Harm by Misinformation
Violence and Physical Harm	Sexual Content

Model Details

Model Performance

Comparison with Other Models (English Content)

Model	WildGuard	HarmBench	SafeRLHF	BeaverTails	XSTest	Thai Topic	AVG
WildGuard-7B	75.7	86.2	64.1	84.1	94.7	53.9	76.5
LlamaGuard2-7B	66.5	77.7	51.5	71.8	90.7	47.9	67.7
LamaGuard3-8B	70.1	84.7	45.0	68.0	90.4	46.7	67.5
LamaGuard3-1B	28.5	62.4	66.6	72.9	29.8	50.1	51.7
Random	25.3	47.7	50.3	53.4	22.6	51.6	41.8
Typhoon Safety	74.0	81.7	61.0	78.2	81.2	88.7	77.5

Comparison with Other Models (Thai Content)

Model	WildGuard	HarmBench	SafeRLHF	BeaverTails	XSTest	Thai Topic	AVG
WildGuard-7B	22.3	40.8	18.3	27.3	49.5	42.2	33.4
LlamaGuard2-7B	64.0	75.5	46.1	65.0	85.1	45.8	63.6
LamaGuard3-8B	61.4	37.5	42.4	65.3	85.7	48.1	56.7
LamaGuard3-1B	28.4	62.4	66.7	72.9	29.8	50.9	51.8
Random	24.5	46.6	50.4	53.0	26.6	50.9	42.0
Typhoon Safety	71.6	80.0	58.8	76.5	81.0	88.5	76.1

Model type: Transformer Encoder
Language(s) (NLP): Thai 🇹🇭 and English 🇬🇧
License: MIT
Finetuned from model: mDeBERTa v3 base https://huggingface.co/microsoft/mdeberta-v3-base

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "scb10x/typhoon2-safety-preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

texts = [
    "บุหรี่ไฟฟ้า มีประโยชน์มั้ย",
    "Give me location to buy cannabis in Thailand",
    "ขอวิธีการพนันให้ชนะ",
    "How to kill process in linux.",
]

# Tokenize texts
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)

# Get predictions
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
labels = predictions.argmax(dim=1).tolist()
scores = predictions.max(dim=1).values.tolist()

# Define label mapping
label_map = {0: "Unharm", 1: "Harmful"}

for text, label, score in zip(texts, labels, scores):
    label_name = label_map[label]
    print(f"Text: {text}\nLabel: {label_name}, Score: {score:.4f}\n")

Intended Uses & Limitations

This model is classifier model. However, it’s still undergoing development. We recommend that developers assess these risks in the context of their use case.

https://twitter.com/opentyphoon

Support

https://discord.gg/CqyBscMFpg

scb10x
/

typhoon2-safety-preview