zpn commited on
Commit
c111a6a
1 Parent(s): bc1efb5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -2657,12 +2657,15 @@ Training data to train the models is released in its entirety. For more details,
2657
 
2658
  ## Usage
2659
 
 
 
 
2660
  ### Sentence Transformers
2661
  ```python
2662
  from sentence_transformers import SentenceTransformer
2663
 
2664
  model = SentenceTransformer("nomic-ai/nomic-embed-text-v1-ablated", trust_remote_code=True)
2665
- sentences = ['What is TSNE?', 'Who is Laurens van der Maaten?']
2666
  embeddings = model.encode(sentences)
2667
  print(embeddings)
2668
  ```
@@ -2679,7 +2682,7 @@ def mean_pooling(model_output, attention_mask):
2679
  input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
2680
  return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
2681
 
2682
- sentences = ['What is TSNE?', 'Who is Laurens van der Maaten?']
2683
 
2684
  tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
2685
  model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1-ablated', trust_remote_code=True)
@@ -2702,8 +2705,8 @@ The model natively supports scaling of the sequence length past 2048 tokens. To
2702
  + tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)
2703
 
2704
 
2705
- - model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1-unsupervised', trust_remote_code=True)
2706
- + model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1-unsupervised', trust_remote_code=True, rotary_scaling_factor=2)
2707
  ```
2708
 
2709
  # Join the Nomic Community
 
2657
 
2658
  ## Usage
2659
 
2660
+ Note `nomic-embed-text` requires prefixes! We support the prefixes `[search_query, search_document, classification, clustering]`.
2661
+ For retrieval applications, you should prepend `search_document` for all your documents and `search_query` for your queries.
2662
+
2663
  ### Sentence Transformers
2664
  ```python
2665
  from sentence_transformers import SentenceTransformer
2666
 
2667
  model = SentenceTransformer("nomic-ai/nomic-embed-text-v1-ablated", trust_remote_code=True)
2668
+ sentences = ['search_query: What is TSNE?', 'search_query Who is Laurens van der Maaten?']
2669
  embeddings = model.encode(sentences)
2670
  print(embeddings)
2671
  ```
 
2682
  input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
2683
  return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
2684
 
2685
+ sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
2686
 
2687
  tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
2688
  model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1-ablated', trust_remote_code=True)
 
2705
  + tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)
2706
 
2707
 
2708
+ - model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1-ablated', trust_remote_code=True)
2709
+ + model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1-ablated', trust_remote_code=True, rotary_scaling_factor=2)
2710
  ```
2711
 
2712
  # Join the Nomic Community