Sentence Similarity
sentence-transformers
PyTorch
Transformers
English
t5
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
prompt-retrieval
text-reranking
feature-extraction
English
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
multi-train
commited on
Commit
•
51e835d
1
Parent(s):
3f6f495
Update README.md
Browse files
README.md
CHANGED
@@ -10,10 +10,12 @@ tags:
|
|
10 |
---
|
11 |
|
12 |
# hkunlp/instructor-xl
|
13 |
-
|
14 |
-
|
15 |
The model is easy to use with `sentence-transformer` library.
|
16 |
|
|
|
|
|
|
|
17 |
## Installation
|
18 |
```bash
|
19 |
git clone https://github.com/HKUNLP/instructor-embedding
|
@@ -32,14 +34,25 @@ embeddings = model.encode([[instruction,sentence,0]])
|
|
32 |
print(embeddings)
|
33 |
```
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
## Calculate Sentence similarities
|
36 |
You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**.
|
37 |
```python
|
38 |
from sklearn.metrics.pairwise import cosine_similarity
|
39 |
sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0],
|
40 |
-
['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0]
|
41 |
sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0],
|
42 |
-
['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0]
|
43 |
embeddings_a = model.encode(sentences_a)
|
44 |
embeddings_b = model.encode(sentences_b)
|
45 |
similarities = cosine_similarity(embeddings_a,embeddings_b)
|
|
|
10 |
---
|
11 |
|
12 |
# hkunlp/instructor-xl
|
13 |
+
We introduce **Instructor**👨🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) ***by simply providing the task instruction, without any finetuning***. Instructor👨 achieves sota on 70 diverse embedding tasks!
|
|
|
14 |
The model is easy to use with `sentence-transformer` library.
|
15 |
|
16 |
+
## Quick start
|
17 |
+
<hr />
|
18 |
+
|
19 |
## Installation
|
20 |
```bash
|
21 |
git clone https://github.com/HKUNLP/instructor-embedding
|
|
|
34 |
print(embeddings)
|
35 |
```
|
36 |
|
37 |
+
## Use cases
|
38 |
+
<hr />
|
39 |
+
|
40 |
+
## Calculate embeddings for your customized texts
|
41 |
+
If you want to calculate customized embeddings for specific sentences, you may follow the unified template to write instructions:
|
42 |
+
|
43 |
+
Represent the `domain` `text_type` for `task_objective`; Input:
|
44 |
+
* `domain` is optional, and it specifies the domain of the text, e.g., science, finance, medicine, etc.
|
45 |
+
* `text_type` is required, and it specifies the encoding unit, e.g., sentence, document, paragraph, etc.
|
46 |
+
* `task_objective` is optional, and it specifies the objective of emebdding, e.g., retrieve a document, classify the sentence, etc.
|
47 |
+
|
48 |
## Calculate Sentence similarities
|
49 |
You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**.
|
50 |
```python
|
51 |
from sklearn.metrics.pairwise import cosine_similarity
|
52 |
sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0],
|
53 |
+
['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0]]
|
54 |
sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0],
|
55 |
+
['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0]]
|
56 |
embeddings_a = model.encode(sentences_a)
|
57 |
embeddings_b = model.encode(sentences_b)
|
58 |
similarities = cosine_similarity(embeddings_a,embeddings_b)
|