AnglE📐: Angle-optimized Text Embeddings
It is Angle 📐, not Angel 👼.
🔥 A New SOTA Model for Semantic Textual Similarity!
Github: https://github.com/SeanLee97/AnglE
STS Results
Model | ATEC | BQ | LCQMC | PAWSX | STS-B | SOHU-dd | SOHU-dc | Avg. |
---|---|---|---|---|---|---|---|---|
^shibing624/text2vec-bge-large-chinese | 38.41 | 61.34 | 71.72 | 35.15 | 76.44 | 71.81 | 63.15 | 59.72 |
^shibing624/text2vec-base-chinese-paraphrase | 44.89 | 63.58 | 74.24 | 40.90 | 78.93 | 76.70 | 63.30 | 63.08 |
SeanLee97/angle-roberta-wwm-base-zhnli-v1 | 49.49 | 72.47 | 78.33 | 59.13 | 77.14 | 72.36 | 60.53 | 67.06 |
SeanLee97/angle-llama-7b-zhnli-v1 | 50.44 | 71.95 | 78.90 | 56.57 | 81.11 | 68.11 | 52.02 | 65.59 |
^ denotes baselines, their results are retrieved from: https://github.com/shibing624/text2vec
Usage
from angle_emb import AnglE
angle = AnglE.from_pretrained('SeanLee97/angle-roberta-wwm-base-zhnli-v1', pooling_strategy='cls').cuda()
vec = angle.encode('你好世界', to_numpy=True)
print(vec)
vecs = angle.encode(['你好世界1', '你好世界2'], to_numpy=True)
print(vecs)
Citation
You are welcome to use our code and pre-trained models. If you use our code and pre-trained models, please support us by citing our work as follows:
@article{li2023angle,
title={AnglE-Optimized Text Embeddings},
author={Li, Xianming and Li, Jing},
journal={arXiv preprint arXiv:2309.12871},
year={2023}
}
- Downloads last month
- 127
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.