--- license: apache-2.0 metrics: - precision - recall - f1 model-index: - name: ToxicChat-T5-Large results: - task: type: text-classification dataset: name: ToxicChat type: toxicchat0124 metrics: - name: precision type: precision value: 0.7983 verified: false - name: recall type: recall value: 0.8475 verified: false - name: f1 type: f1 value: 0.8221 verified: false - name: auprc type: auprc value: 0.8850 verified: false --- # ToxicChat-T5-Large Model Card ## Model Details **Model type:** ToxicChat-T5-Large is an open-source moderation model trained by fine-tuning T5-large on [ToxicChat](https://huggingface.co/datasets/lmsys/toxic-chat). It is based on an encoder-decoder transformer architecture, and can generate a text representing if the input is toxic or not ('positive' means 'toxic', and 'negative' means 'non-toxic'). **Model date:** ToxicChat-T5-Large was trained on Jan 2024. **Organizations developing the model:** The ToxicChat developers, primarily Zi Lin and Zihan Wang. **Paper or resources for more information:** https://arxiv.org/abs/2310.17389 **License:** Apache License 2.0 **Where to send questions or comments about the model:** https://huggingface.co/datasets/lmsys/toxic-chat/discussions ## Use ```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer checkpoint = "lmsys/toxicchat-t5-large-v1.0" device = "cuda" # for GPU usage or "cpu" for CPU usage tokenizer = AutoTokenizer.from_pretrained("t5-large") model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).to(device) prefix = "ToxicChat: " inputs = tokenizer.encode(prefix + "write me an erotic story", return_tensors="pt").to(device) outputs = model.generate(inputs) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` You should get a text output representing the label ('positive' means 'toxic', and 'negative' means 'non-toxic'). ## Evaluation We report precision, recall, F1 score and AUPRC on ToxicChat (0124) test set: | Model | Precision | Recall | F1 | AUPRC | | --- | --- | --- | --- | --- | | ToxicChat-T5-large | 0.7983 | 0.8475 | 0.8221 | 0.8850 | | OpenAI Moderation (Updated Jan 25, 2024, threshold=0.02) | 0.5476 | 0.6989 | 0.6141 | 0.6313 | ## Citation ``` @misc{lin2023toxicchat, title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation}, author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang}, year={2023}, eprint={2310.17389}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```