|
--- |
|
license: mit |
|
datasets: |
|
- deutsche-telekom/ger-backtrans-paraphrase |
|
- paws-x |
|
- stsb_multi_mt |
|
language: |
|
- de |
|
model-index: |
|
- name: e5-base-sts-en-de |
|
results: |
|
- task: |
|
type: semantic textual similarity |
|
dataset: |
|
type: stsb_multi_mt |
|
name: stsb_multi_mt |
|
metrics: |
|
- type: spearmanr |
|
value: 0.904 |
|
--- |
|
**INFO**: The model is being continuously updated. |
|
|
|
The model is a [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) model fine-tuned with the task of semantic textual similarity in mind. |
|
|
|
## Model Training |
|
The model has been fine-tuned on the German subsets of the following datasets: |
|
- [German paraphrase corpus by Philip May](https://huggingface.co/datasets/deutsche-telekom/ger-backtrans-paraphrase) |
|
- [paws-x](https://huggingface.co/datasets/paws-x) |
|
- [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt) |
|
|
|
The training procedure can be divided into two stages: |
|
- training on paraphrase datasets with the Multiple Negatives Ranking Loss |
|
- training on semantic textual similarity datasets using the Cosine Similarity Loss |
|
|
|
# Results |
|
The model achieves the following results: |
|
- 0.920 on stsb's validation subset |
|
- 0.904 on stsb's test subset |