|
--- |
|
license: mit |
|
widget: |
|
- text: "[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]супер, вот только проснулся, у тебя как?" |
|
example_title: "Dialog example 1" |
|
- text: "[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм" |
|
example_title: "Dialog example 2" |
|
- text: "[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?" |
|
example_title: "Dialog example 3" |
|
language: |
|
- ru |
|
tags: |
|
- conversational |
|
--- |
|
|
|
This classification model is based on [DeepPavlov/rubert-base-cased-sentence](https://huggingface.co/DeepPavlov/rubert-base-cased-sentence). |
|
The model should be used to produce relevance and specificity of the last message in the context of a dialogue. |
|
|
|
The labels explanation: |
|
- `relevance`: is the last message in the dialogue relevant in the context of the full dialogue. |
|
- `specificity`: is the last message in the dialogue interesting and promotes the continuation of the dialogue. |
|
|
|
It is pretrained on a large corpus of dialog data in unsupervised manner: the model is trained to predict whether last response was in a real dialog, or it was pulled from some other dialog at random. |
|
Then it was finetuned on manually labelled examples (dataset will be posted soon). |
|
|
|
The model was trained with three messages in the context and one response. Each message was tokenized separately with ``` max_length = 32 ```. |
|
|
|
The performance of the model on validation split (dataset will be posted soon) (with the best thresholds for validation samples): |
|
|
|
|
|
| | threshold | f0.5 | ROC AUC | |
|
|:------------|------------:|-------:|----------:| |
|
| relevance | 0.49 | 0.84 | 0.79 | |
|
| specificity | 0.53 | 0.83 | 0.83 | |
|
|
|
|
|
How to use: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-quality-classifier-base') |
|
model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-quality-classifier-base') |
|
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt') |
|
with torch.inference_mode(): |
|
logits = model(**inputs).logits |
|
probas = torch.sigmoid(logits)[0].cpu().detach().numpy() |
|
relevance, specificity = probas |
|
``` |
|
|
|
The [app](https://huggingface.co/spaces/tinkoff-ai/response-quality-classifiers) where you can easily interact with this model. |
|
|
|
The work was done during internship at Tinkoff by [egoriyaa](https://github.com/egoriyaa), mentored by [solemn-leader](https://huggingface.co/solemn-leader). |