Sentence Transformers integration

#4
by tomaarsen HF staff - opened
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -2934,6 +2934,38 @@ Based on the [intfloat/e5-large-unsupervised](https://huggingface.co/intfloat/e5
2934
 
2935
  ## Usage
2936
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2937
 
2938
  ### Using Huggingface transformers
2939
 
 
2934
 
2935
  ## Usage
2936
 
2937
+ ### Using Sentence Transformers
2938
+
2939
+ You can use the sentence-transformers package to use an snowflake-arctic-embed model, as shown below.
2940
+
2941
+ ```python
2942
+ from sentence_transformers import SentenceTransformer
2943
+
2944
+ model = SentenceTransformer("Snowflake/snowflake-arctic-embed-xs")
2945
+
2946
+ queries = ['what is snowflake?', 'Where can I get the best tacos?']
2947
+ documents = ['The Data Cloud!', 'Mexico City of Course!']
2948
+
2949
+ query_embeddings = model.encode(queries, prompt_name="query")
2950
+ document_embeddings = model.encode(documents)
2951
+
2952
+ scores = query_embeddings @ document_embeddings.T
2953
+ for query, query_scores in zip(queries, scores):
2954
+ doc_score_pairs = list(zip(documents, query_scores))
2955
+ doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
2956
+ # Output passages & scores
2957
+ print("Query:", query)
2958
+ for document, score in doc_score_pairs:
2959
+ print(score, document)
2960
+ ```
2961
+ ```
2962
+ Query: what is snowflake?
2963
+ 0.57515126 The Data Cloud!
2964
+ 0.45798576 Mexico City of Course!
2965
+ Query: Where can I get the best tacos?
2966
+ 0.5636022 Mexico City of Course!
2967
+ 0.5044898 The Data Cloud!
2968
+ ```
2969
 
2970
  ### Using Huggingface transformers
2971
 
config_sentence_transformers.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.7.0.dev0",
4
+ "transformers": "4.39.3",
5
+ "pytorch": "2.1.0+cu121"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null
11
+ }
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }