Sentence Similarity
sentence-transformers
PyTorch
Transformers
English
t5
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
prompt-retrieval
text-reranking
feature-extraction
English
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
pipeline_tag: sentence-similarity | |
language: en | |
license: apache-2.0 | |
tags: | |
- sentence-transformers | |
- feature-extraction | |
- sentence-similarity | |
- transformers | |
# hku-nlp/instructor-large | |
This is a general embedding model: It maps **any** piece of text (e.g., a title, a sentence, a document, etc.) to a fixed-length vector in test time **without further training**. With instructions, the embeddings are **domain-specific** (e.g., specialized for science, finance, etc.) and **task-aware** (e.g., customized for classification, information retrieval, etc.) | |
The model is easy to use with `sentence-transformer` library. | |
## Installation | |
```bash | |
git clone https://github.com/HKUNLP/instructor-embedding | |
cd sentence-transformers | |
pip install -e . | |
``` | |
## Compute your customized embeddings | |
Then you can use the model like this to calculate domain-specific and task-aware embeddings: | |
```python | |
from sentence_transformers import SentenceTransformer | |
sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments" | |
instruction = "Represent the Science title; Input:" | |
model = SentenceTransformer('hku-nlp/instructor-large') | |
embeddings = model.encode([[instruction,sentence,0]]) | |
print(embeddings) | |
``` | |
## Calculate Sentence similarities | |
You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**. | |
```python | |
from sklearn.metrics.pairwise import cosine_similarity | |
sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0], | |
['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0] | |
sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0], | |
['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0] | |
embeddings_a = model.encode(sentences_a) | |
embeddings_b = model.encode(sentences_b) | |
similarities = cosine_similarity(embeddings_a,embeddings_b) | |
print(similarities) | |
``` |