license: apache-2.0
pipeline_tag: text-classification
Description:
nasa-smd-ibm-ranker
(INDUS Ranker) is a encoder based model that takes a search query, and a passage, and calculates the relevancy of the passage to the query. This is used in conjunction with sentence transformers to re-rank the passages matched by the sentence transformer, there-by improving relevance of Information Retrieval processes.
The Model is Finetuned using MS-Marco, and tested using Science QA datasets.
The Model is an integral part of Neural Search
Information Retreival process used by the Science Discovery Engine, Along with the finetuned sentence transformer (https://huggingface.co/nasa-impact/nasa-smd-ibm-st-v2).
Evaluation:
Model Evaluation on msmarco dev set, and NASA-QA:
Intended uses & limitations
Both query and passage have to fit in 512 Tokens (along with [CLS] and [SEP] special tokens). The intended use is to rerank the first dozens of embedding search results.
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("nasa-impact/nasa-smd-ibm-ranker")
model = AutoModelForSequenceClassification.from_pretrained("nasa-impact/nasa-smd-ibm-ranker")
Note
This Ranker Model is released in support of the training and evaluation of the encoder language model "Indus".
Accompanying paper can be found here: https://arxiv.org/abs/2405.10725
Cite this Model
@misc {nasa-impact_2024,
author = { {NASA-IMPACT} },
title = { nasa-smd-ibm-ranker (Revision 4f42d19) },
year = 2024,
url = { https://huggingface.co/nasa-impact/nasa-smd-ibm-ranker },
doi = { 10.57967/hf/1849 },
publisher = { Hugging Face }
}