Safety classifier for Detoxifying Large Language Models via Knowledge Editing

πŸ’» Usage

from transformers import RobertaForSequenceClassification, RobertaTokenizer
safety_classifier_dir = 'zjunlp/SafeEdit-Safety-Classifier'
safety_classifier_model = RobertaForSequenceClassification.from_pretrained(safety_classifier_dir)
safety_classifier_tokenizer = RobertaTokenizer.from_pretrained(safety_classifier_dir)

You can also download DINM-Safety-Classifier manually, and set the safety_classifier_dir to your own path.

πŸ“– Citation

If you use our work, please cite our paper:

@misc{wang2024SafeEdit,
      title={Detoxifying Large Language Models via Knowledge Editing}, 
      author={Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen},
      year={2024},
      eprint={2403.14472},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
      url={https://arxiv.org/abs/2403.14472},

}
Downloads last month
453
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.