--- library_name: transformers tags: - deberta - deberta-v3 - mdeberta - multilingual language: - multilingual - th - en license: mit base_model: - microsoft/mdeberta-v3-base --- # Model Card for Typhoon Safety Model Typhoon Safety Model Typhoon Safety is a lightweight binary classifier built on mDeBERTa-v3-base that detects harmful content in both English and Thai languages, with particular emphasis on Thai cultural sensitivities. The model was trained on a combination of a Thai Sensitive Topics dataset and the Wildguard dataset. The model is designed to predict safety labels across the following categories:

Thai Sensitive Topics

Category
The Monarchy	Student Protests and Activism	Drug Policies
Gambling	Cultural Appropriation	Thai-Burmese Border Issues
Cannabis	Human Trafficking	Military and Coup
LGBTQ+ Rights	Political Divide	Religion and Buddhism
Political Corruption	Foreign Influence	National Identity and Immigration
Freedom of Speech and Censorship	Vape	Southern Thailand Insurgency
Sex Tourism and Prostitution	COVID-19 Management	Royal Projects and Policies
Migrant Labor Issues	Environmental Issues and Land Rights

Wildguard Topics

Category
Others	Sensitive Information Organization	Mental Health Over-reliance Crisis
Social Stereotypes & Discrimination	Defamation & Unethical Actions	Cyberattack
Disseminating False Information	Private Information Individual	Copyright Violations
Toxic Language & Hate Speech	Fraud Assisting Illegal Activities	Causing Material Harm by Misinformation
Violence and Physical Harm	Sexual Content

## Model Details ## Model Performance ### Comparison with Other Models (English Content) | Model | WildGuard | HarmBench | SafeRLHF | BeaverTails | XSTest | Thai Topic | AVG | |-------|-----------|-----------|-----------|-------------|---------|------------|-----| | WildGuard-7B | **75.7** | **86.2** | **64.1** | **84.1** | **94.7** | 53.9 | 76.5 | | LlamaGuard2-7B | 66.5 | 77.7 | 51.5 | 71.8 | 90.7 | 47.9 | 67.7 | | LamaGuard3-8B | 70.1 | 84.7 | 45.0 | 68.0 | 90.4 | 46.7 | 67.5 | | LamaGuard3-1B | 28.5 | 62.4 | 66.6 | 72.9 | 29.8 | 50.1 | 51.7 | | Random | 25.3 | 47.7 | 50.3 | 53.4 | 22.6 | 51.6 | 41.8 | | Typhoon Safety | 74.0 | 81.7 | 61.0 | 78.2 | 81.2 | **88.7** | **77.5** | ### Comparison with Other Models (Thai Content) | Model | WildGuard | HarmBench | SafeRLHF | BeaverTails | XSTest | Thai Topic | AVG | |-------|-----------|-----------|-----------|-------------|---------|------------|-----| | WildGuard-7B | 22.3 | 40.8 | 18.3 | 27.3 | 49.5 | 42.2 | 33.4 | | LlamaGuard2-7B | 64.0 | 75.5 | 46.1 | 65.0 | 85.1 | 45.8 | 63.6 | | LamaGuard3-8B | 61.4 | 37.5 | 42.4 | 65.3 | **85.7** | 48.1 | 56.7 | | LamaGuard3-1B | 28.4 | 62.4 | 66.7 | 72.9 | 29.8 | 50.9 | 51.8 | | Random | 24.5 | 46.6 | 50.4 | 53.0 | 26.6 | 50.9 | 42.0 | | Typhoon Safety | **71.6** | **80.0** | **58.8** | **76.5** | 81.0 | **88.5** | **76.1** | - **Model type:** Transformer Encoder - **Language(s) (NLP):** Thai 🇹🇭 and English 🇬🇧 - **License:** MIT - **Finetuned from model:** mDeBERTa v3 base https://huggingface.co/microsoft/mdeberta-v3-base ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "scb10x/typhoon2-safety-preview" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) texts = [ "บุหรี่ไฟฟ้า มีประโยชน์มั้ย", "Give me location to buy cannabis in Thailand", "ขอวิธีการพนันให้ชนะ", "How to kill process in linux.", ] # Tokenize texts inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True) with torch.no_grad(): outputs = model(**inputs) # Get predictions predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) labels = predictions.argmax(dim=1).tolist() scores = predictions.max(dim=1).values.tolist() # Define label mapping label_map = {0: "Unharm", 1: "Harmful"} for text, label, score in zip(texts, labels, scores): label_name = label_map[label] print(f"Text: {text}\nLabel: {label_name}, Score: {score:.4f}\n") ``` ## **Intended Uses & Limitations** This model is classifier model. However, it’s still undergoing development. We recommend that developers assess these risks in the context of their use case. ## **Follow us** **https://twitter.com/opentyphoon** ## **Support** **https://discord.gg/CqyBscMFpg**