danielheinz
/

e5-base-sts-en-de

Feature Extraction

Inference Endpoints

text-embeddings-inference

Model card Files Files and versions Community

e5-base-sts-en-de / README.md

danielheinz's picture

Update README.md

a24243f verified 6 months ago

|

raw history blame contribute delete

No virus

1.24 kB

	---
	license: mit
	datasets:
	- deutsche-telekom/ger-backtrans-paraphrase
	- paws-x
	- stsb_multi_mt
	language:
	- de
	model-index:
	- name: e5-base-sts-en-de
	results:
	- task:
	type: semantic textual similarity
	dataset:
	type: stsb_multi_mt
	name: stsb_multi_mt
	metrics:
	- type: spearmanr
	value: 0.904
	---
	INFO: The model is being continuously updated.

	The model is a [multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) model fine-tuned with the task of semantic textual similarity in mind.

	## Model Training
	The model has been fine-tuned on the German subsets of the following datasets:
	- [German paraphrase corpus by Philip May](https://huggingface.co/datasets/deutsche-telekom/ger-backtrans-paraphrase)
	- [paws-x](https://huggingface.co/datasets/paws-x)
	- [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt)

	The training procedure can be divided into two stages:
	- training on paraphrase datasets with the Multiple Negatives Ranking Loss
	- training on semantic textual similarity datasets using the Cosine Similarity Loss

	# Results
	The model achieves the following results:
	- 0.920 on stsb's validation subset
	- 0.904 on stsb's test subset