metadata

license: apache-2.0
pipeline_tag: text-classification

Description:

nasa-smd-ibm-ranker (INDUS Ranker) is a encoder based model that takes a search query, and a passage, and calculates the relevancy of the passage to the query. This is used in conjunction with sentence transformers to re-rank the passages matched by the sentence transformer, there-by improving relevance of Information Retrieval processes.

The Model is Finetuned using MS-Marco, and tested using Science QA datasets.

The Model is an integral part of Neural Search Information Retreival process used by the Science Discovery Engine, Along with the finetuned sentence transformer (https://huggingface.co/nasa-impact/nasa-smd-ibm-st-v2).

Evaluation:

Model Evaluation on msmarco dev set, and NASA-QA:

Intended uses & limitations

Both query and passage have to fit in 512 Tokens (along with [CLS] and [SEP] special tokens). The intended use is to rerank the first dozens of embedding search results.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("nasa-impact/nasa-smd-ibm-ranker")
model = AutoModelForSequenceClassification.from_pretrained("nasa-impact/nasa-smd-ibm-ranker")

Note

This Ranker Model is released in support of the training and evaluation of the encoder language model "Indus".

Accompanying paper can be found here: https://arxiv.org/abs/2405.10725

Cite this Model

@misc {nasa-impact_2024,
    author       = { {NASA-IMPACT} },
    title        = { nasa-smd-ibm-ranker (Revision 4f42d19) },
    year         = 2024,
    url          = { https://huggingface.co/nasa-impact/nasa-smd-ibm-ranker },
    doi          = { 10.57967/hf/1849 },
    publisher    = { Hugging Face }
}