AnglE📐: Angle-optimized Text Embeddings

It is Angle 📐, not Angel 👼.

🔥 A New SOTA Model for Semantic Textual Similarity!

Github: https://github.com/SeanLee97/AnglE

STS Results

Model	ATEC	BQ	LCQMC	PAWSX	STS-B	SOHU-dd	SOHU-dc	Avg.
^shibing624/text2vec-bge-large-chinese	38.41	61.34	71.72	35.15	76.44	71.81	63.15	59.72
^shibing624/text2vec-base-chinese-paraphrase	44.89	63.58	74.24	40.90	78.93	76.70	63.30	63.08
SeanLee97/angle-roberta-wwm-base-zhnli-v1	49.49	72.47	78.33	59.13	77.14	72.36	60.53	67.06
SeanLee97/angle-llama-7b-zhnli-v1	50.44	71.95	78.90	56.57	81.11	68.11	52.02	65.59

^ denotes baselines, their results are retrieved from: https://github.com/shibing624/text2vec

Usage

from angle_emb import AnglE

angle = AnglE.from_pretrained('SeanLee97/angle-roberta-wwm-base-zhnli-v1', pooling_strategy='cls').cuda()
vec = angle.encode('你好世界', to_numpy=True)
print(vec)
vecs = angle.encode(['你好世界1', '你好世界2'], to_numpy=True)
print(vecs)

Citation

You are welcome to use our code and pre-trained models. If you use our code and pre-trained models, please support us by citing our work as follows:

@article{li2023angle,
  title={AnglE-Optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}

SeanLee97
/

angle-roberta-wwm-base-zhnli-v1

AnglE📐: Angle-optimized Text Embeddings

Usage

Citation

Datasets used to train SeanLee97/angle-roberta-wwm-base-zhnli-v1