You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Introduction

We introduce NV-Retriever-v1, an embedding model, which is optimized for retrieval. It achieves the highest score of 60.9 on 15 retrieval tasks within the MTEB retrieval benchmark (as of 12th July, 2024).

This model is ready for non-commercial use.

For commercial use, the models of NeMo Retriever Microservices (NIMs) may be used and are trained with the same techniques with different datasets.

Technical details can be found in our paper: NV-Retriever: Improving text embedding models with effective hard-negative mining

How to use

It is required to set trust_remote_code=True when loading the model, as it contains a custom module for bidirectional attention and applying the masked mean_pooling.

import torch
from transformers import AutoTokenizer, AutoModel


tokenizer = AutoTokenizer.from_pretrained('nvidia/NV-Retriever-v1')
model = AutoModel.from_pretrained('nvidia/NV-Retriever-v1', trust_remote_code=True)

query_prefix = 'Given a web search query, retrieve relevant passages that answer the query: '
document_prefix = 'passage: '


queries = [
    "how much protein should a female eat",
    "summit define",
]
documents = [
    "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
    "Definition of summit for English Language Learners. : 1  the highest point of a mountain : the top of a mountain. : 2  the highest level. : 3  a meeting or series of meetings between the leaders of two or more governments."
]
queries = [f"{query_prefix} {query}" for query in queries]
documents = [f"{document_prefix} {document}" for document in documents]


batch_queries = tokenizer(queries, padding=True, truncation=True, return_tensors='pt')
batch_documents = tokenizer(documents, padding=True, truncation=True, return_tensors='pt')

with torch.no_grad():
    embeddings_queries = model(**batch_queries)
    embeddings_documents = model(**batch_documents)


scores = (embeddings_queries @ embeddings_documents.T)
print(scores.tolist())
# [[0.6778843402862549, -0.03561091050505638], [-0.05117562413215637, 0.7305730581283569]]

NV-Retriever-v1 Team:

  • Mengyao Xu
  • Gabriel Moreira
  • Radek Osmulski
  • Ronay Ak
  • Benedikt Schifferer
  • Even Oldridge

Correspondence to

Benedikt Schifferer (bschifferer@nvidia.com)

Citation

@misc{moreira2024nvretrieverimprovingtextembedding,
      title={NV-Retriever: Improving text embedding models with effective hard-negative mining}, 
      author={Gabriel de Souza P. Moreira and Radek Osmulski and Mengyao Xu and Ronay Ak and Benedikt Schifferer and Even Oldridge},
      year={2024},
      eprint={2407.15831},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2407.15831}, 
}

License

License to use this model is covered by the NVIDIA license agreement. By downloading the release version of the model, you accept the terms and conditions of these licenses . For each dataset a user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose.

Troubleshooting

1. Access to model nvidia/NV-Retriever-v1 is restricted. You must be authenticated to access it

Use your Hugging Face access token to execute huggingface-cli login. You can get a User Access Token from your Settings page.

2. Instruction Prompt Templates

NV-Retriever-v1 uses a query and document prefix similar to [Improving Text Embeddings with Large Language Models] (https://arxiv.org/pdf/2401.00368). It does not use the template with “Instruct:” and “Query:” ( f'Instruct: {task_description}\nQuery: {query}' ) it uses only “{task_description}: “. It is important to end the prefix with a colon (“:”) and a space. The document prefix for documents, “passage: ”, is the same for every task. .

Example:

query = f{"Given a web search query, retrieve relevant passages that answer the query: {query}"}
document = f{"passage: {document}"}

3. User Warning About Prompt

NV-Retriever-v1 expects Instruction Prompt Templates for each query and document. The custom code will modify the attention_mask to apply mean_pooling operation only on the actual text without the prefix. The custom code will look for the token_id 28747 and remove all attention prior to the first appearance of 28747.

As query and document require a prefix with the token_id 28747, the model will output a warning, if the token_id is not present in the input. It is likely that the model is used incorrectly.

Token_id 28747 is the character “:” not separated to some word. For example “query: ”, “passage: ” or “Represent this query: ”. If the input is “query :” with a space, the token_id for “:” is different. Note our custom code will find the first 28747 token in the input, so you don’t need to worry about the “:” inside the query or document content.

UserWarning: Input does not contain special token 28747 to mask out instruction prompt. Please check if prefix are applied, correctly warnings.warn(f"Input does not contain special token {sep_token_id} to mask out instruction prompt. Please check if prefix are applied, correctly")

4. Multi-GPU support

NV-Retriever-v1 supports multi-GPU with DataParallel.

import torch

model = torch.nn.DataParallel(model).cuda()

Intended use

The NV-Retriever Model is designed for users who need a high-performance embedding model for the retrieval task.

Model Architecture

Architecture Type: Decoder-only bidirectional LLM
Network Architecture: Mistral-7B-v0.1 with Bidirectional attention masking
Pooling Type: Average (mean) pooling
Embedding Dimension: 4096
Max Input Tokens: 512

The NV-Retriever-v1 Model is based on the Mistral-7B-v0.1 architecture with a bidirectional attention masking mechanism.

Input

Input Type: Text
Input Format: List of comma separated strings with task-specific instructions

Output

Output Type: Floats
Output Format: List of float arrays
Other Properties Related to Output: Each array contains the embeddings of size 4096 for the corresponding input string

Model Version(s)

NV-Retriever-v1

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month
27
Safetensors
Model size
7.11B params
Tensor type
FP16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Spaces using nvidia/NV-Retriever-v1 2

Evaluation results