kunato's picture
Update README.md
0916167 verified
|
raw
history blame
5.48 kB
metadata
library_name: transformers
tags:
  - deberta
  - deberta-v3
  - mdeberta
  - multilingual
language:
  - multilingual
  - th
  - en
license: mit
base_model:
  - microsoft/mdeberta-v3-base

Model Card for Typhoon Safety Model

Typhoon Safety Model

Typhoon Safety is a lightweight binary classifier built on mDeBERTa-v3-base that detects harmful content in both English and Thai languages, with particular emphasis on Thai cultural sensitivities. The model was trained on a combination of a Thai Sensitive Topics dataset and the Wildguard dataset.

The model is designed to predict safety labels across the following categories:

Thai Sensitive Topics
Category
The Monarchy Student Protests and Activism Drug Policies
Gambling Cultural Appropriation Thai-Burmese Border Issues
Cannabis Human Trafficking Military and Coup
LGBTQ+ Rights Political Divide Religion and Buddhism
Political Corruption Foreign Influence National Identity and Immigration
Freedom of Speech and Censorship Vape Southern Thailand Insurgency
Sex Tourism and Prostitution COVID-19 Management Royal Projects and Policies
Migrant Labor Issues Environmental Issues and Land Rights
Wildguard Topics
Category
Others Sensitive Information Organization Mental Health Over-reliance Crisis
Social Stereotypes & Discrimination Defamation & Unethical Actions Cyberattack
Disseminating False Information Private Information Individual Copyright Violations
Toxic Language & Hate Speech Fraud Assisting Illegal Activities Causing Material Harm by Misinformation
Violence and Physical Harm Sexual Content

Model Details

Model Performance

Comparison with Other Models (English Content)

Model WildGuard HarmBench SafeRLHF BeaverTails XSTest Thai Topic AVG
WildGuard-7B 75.7 86.2 64.1 84.1 94.7 53.9 76.5
LlamaGuard2-7B 66.5 77.7 51.5 71.8 90.7 47.9 67.7
LamaGuard3-8B 70.1 84.7 45.0 68.0 90.4 46.7 67.5
LamaGuard3-1B 28.5 62.4 66.6 72.9 29.8 50.1 51.7
Random 25.3 47.7 50.3 53.4 22.6 51.6 41.8
Typhoon Safety 74.0 81.7 61.0 78.2 81.2 88.7 77.5

Comparison with Other Models (Thai Content)

Model WildGuard HarmBench SafeRLHF BeaverTails XSTest Thai Topic AVG
WildGuard-7B 22.3 40.8 18.3 27.3 49.5 42.2 33.4
LlamaGuard2-7B 64.0 75.5 46.1 65.0 85.1 45.8 63.6
LamaGuard3-8B 61.4 37.5 42.4 65.3 85.7 48.1 56.7
LamaGuard3-1B 28.4 62.4 66.7 72.9 29.8 50.9 51.8
Random 24.5 46.6 50.4 53.0 26.6 50.9 42.0
Typhoon Safety 71.6 80.0 58.8 76.5 81.0 88.5 76.1

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "scb10x/typhoon2-safety-preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

texts = [
    "บุหรี่ไฟฟ้า มีประโยชน์มั้ย",
    "Give me location to buy cannabis in Thailand",
    "ขอวิธีการพนันให้ชนะ",
    "How to kill process in linux.",
]

# Tokenize texts
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)

# Get predictions
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
labels = predictions.argmax(dim=1).tolist()
scores = predictions.max(dim=1).values.tolist()

# Define label mapping
label_map = {0: "Unharm", 1: "Harmful"}

for text, label, score in zip(texts, labels, scores):
    label_name = label_map[label]
    print(f"Text: {text}\nLabel: {label_name}, Score: {score:.4f}\n")

Intended Uses & Limitations

This model is classifier model. However, it’s still undergoing development. We recommend that developers assess these risks in the context of their use case.

Follow us

https://twitter.com/opentyphoon

Support

https://discord.gg/CqyBscMFpg