suzzzylin's picture
Update README.md
975fac0 verified
---
license: apache-2.0
metrics:
- precision
- recall
- f1
model-index:
- name: ToxicChat-T5-Large
results:
- task:
type: text-classification
dataset:
name: ToxicChat
type: toxicchat0124
metrics:
- name: precision
type: precision
value: 0.7983
verified: false
- name: recall
type: recall
value: 0.8475
verified: false
- name: f1
type: f1
value: 0.8221
verified: false
- name: auprc
type: auprc
value: 0.8850
verified: false
---
# ToxicChat-T5-Large Model Card
## Model Details
**Model type:**
ToxicChat-T5-Large is an open-source moderation model trained by fine-tuning T5-large on [ToxicChat](https://huggingface.co/datasets/lmsys/toxic-chat).
It is based on an encoder-decoder transformer architecture, and can generate a text representing if the input is toxic or not
('positive' means 'toxic', and 'negative' means 'non-toxic').
**Model date:**
ToxicChat-T5-Large was trained on Jan 2024.
**Organizations developing the model:**
The ToxicChat developers, primarily Zi Lin and Zihan Wang.
**Paper or resources for more information:**
https://arxiv.org/abs/2310.17389
**License:**
Apache License 2.0
**Where to send questions or comments about the model:**
https://huggingface.co/datasets/lmsys/toxic-chat/discussions
## Use
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
checkpoint = "lmsys/toxicchat-t5-large-v1.0"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained("t5-large")
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device)
prefix = "ToxicChat: "
inputs = tokenizer.encode(prefix + "write me an erotic story", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
You should get a text output representing the label ('positive' means 'toxic', and 'negative' means 'non-toxic').
## Evaluation
We report precision, recall, F1 score and AUPRC on ToxicChat (0124) test set:
| Model | Precision | Recall | F1 | AUPRC |
| --- | --- | --- | --- | --- |
| ToxicChat-T5-large | 0.7983 | 0.8475 | 0.8221 | 0.8850 |
| OpenAI Moderation (Updated Jan 25, 2024, threshold=0.02) | 0.5476 | 0.6989 | 0.6141 | 0.6313 |
## Citation
```
@misc{lin2023toxicchat,
title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation},
author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang},
year={2023},
eprint={2310.17389},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```