File size: 1,155 Bytes
6c13fab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
datasets:
- quora
language: en
license: mit
pipeline_tag: text-classification
tags:
- roberta
- text-classification
---
# Cross-Encoder for Quora Duplicate Questions Detection

This model was trained using [SentenceTransformers](https://sbert.net) [Cross-Encoder](https://www.sbert.net/examples/applications/cross-encoder/README.html) class.

This model uses [roberta-large](https://huggingface.co/roberta-large).

## Training Data

This model was trained on the [Quora Duplicate Questions](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs) dataset.

The model will predict a score between 0 and 1: How likely the two given questions are duplicates.

Note: The model is not suitable to estimate the similarity of questions, e.g. the two questions "How to learn Java" and "How to learn Python" will result in a rahter low score, as these are not duplicates.

## Usage and Performance

The trained model can be used like this:

```python
from sentence_transformers import CrossEncoder

model = CrossEncoder('model_name')
scores = model.predict([('Question 1', 'Question 2'), ('Question 3', 'Question 4')])

print(scores)
```