hfl/chinese-roberta-wwm-ext fine-tuned on the COLDataset. Usage example:
import torch
from transformers.models.bert import BertTokenizer, BertForSequenceClassification
tokenizer = BertTokenizer.from_pretrained('thu-coai/roberta-base-cold')
model = BertForSequenceClassification.from_pretrained('thu-coai/roberta-base-cold')
model.eval()
texts = ['你就是个傻逼!','黑人很多都好吃懒做,偷奸耍滑!','男女平等,黑人也很优秀。']
model_input = tokenizer(texts,return_tensors="pt",padding=True)
model_output = model(**model_input, return_dict=False)
prediction = torch.argmax(model_output[0].cpu(), dim=-1)
prediction = [p.item() for p in prediction]
print(prediction) # --> [1, 1, 0] (0 for Non-Offensive, 1 for Offenisve)
This fine-tuned model obtains 82.75 accuracy and 82.39 macro-F1 on the test set.
Please kindly cite the original paper if you use this model.
@article{deng2022cold,
title={Cold: A benchmark for chinese offensive language detection},
author={Deng, Jiawen and Zhou, Jingyan and Sun, Hao and Zheng, Chujie and Mi, Fei and Meng, Helen and Huang, Minlie},
booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},
year={2022}
}
- Downloads last month
- 1,366
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.