RuBERTConv Toxic Editor
Model description
Tagging model for detoxification based on rubert-base-cased-conversational.
4 possible classes:
- Equal = save tokens
- Replace = replace tokens with mask
- Delete = remove tokens
- Insert = insert mask before tokens
Use in pair with mask filler.
Intended uses & limitations
How to use
Colab: link
import torch
from transformers import AutoTokenizer, pipeline
tagger_model_name = "IlyaGusev/rubertconv_toxic_editor"
device = "cuda" if torch.cuda.is_available() else "cpu"
device_num = 0 if device == "cuda" else -1
tagger_pipe = pipeline(
"token-classification",
model=tagger_model_name,
tokenizer=tagger_model_name,
framework="pt",
device=device_num,
aggregation_strategy="max"
)
text = "..."
tagger_predictions = tagger_pipe([text], batch_size=1)
sample_predictions = tagger_predictions[0]
print(sample_predictions)
Training data
- Dataset: russe_detox_2022
Training procedure
- Parallel corpus convertion: compute_tags.py
- Training script: train.py
- Pipeline step: dvc.yaml, train_marker
Eval results
TBA
- Downloads last month
- 163
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.