intfloat
/

e5-small

Sentence Similarity

sentence-transformers

Sentence Transformers

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

intfloat commited on Jan 4, 2023

Commit

34f5747

•

1 Parent(s): 39139e8

update README.md

Files changed (1) hide show

README.md +19 -2

README.md CHANGED Viewed

@@ -2614,7 +2614,6 @@ import torch.nn.functional as F
 from torch import Tensor
 from transformers import AutoTokenizer, AutoModel
-from transformers.modeling_outputs import BaseModelOutput
 def average_pool(last_hidden_states: Tensor,
@@ -2636,7 +2635,7 @@ model = AutoModel.from_pretrained('intfloat/e5-small')
 # Tokenize the input texts
 batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')
-outputs: BaseModelOutput = model(**batch_dict)
 embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
 # (Optionally) normalize embeddings
@@ -2653,3 +2652,21 @@ Please refer to our paper at [https://arxiv.org/pdf/2212.03533.pdf](https://arxi
 Check out [unilm/e5](https://github.com/microsoft/unilm/tree/master/e5) to reproduce evaluation results
 on the [BEIR](https://arxiv.org/abs/2104.08663) and [MTEB benchmark](https://arxiv.org/abs/2210.07316).

 from torch import Tensor
 from transformers import AutoTokenizer, AutoModel
 def average_pool(last_hidden_states: Tensor,
 # Tokenize the input texts
 batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')
+outputs = model(**batch_dict)
 embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
 # (Optionally) normalize embeddings
 Check out [unilm/e5](https://github.com/microsoft/unilm/tree/master/e5) to reproduce evaluation results
 on the [BEIR](https://arxiv.org/abs/2104.08663) and [MTEB benchmark](https://arxiv.org/abs/2210.07316).
+## Citation
+If you find our paper or models helpful, please consider cite as follows:
+```
+@article{wang2022text,
+  title={Text Embeddings by Weakly-Supervised Contrastive Pre-training},
+  author={Wang, Liang and Yang, Nan and Huang, Xiaolong and Jiao, Binxing and Yang, Linjun and Jiang, Daxin and Majumder, Rangan and Wei, Furu},
+  journal={arXiv preprint arXiv:2212.03533},
+  year={2022}
+}
+```
+## Limitations
+This model only works for English texts. Long texts will be truncated to at most 512 tokens.