metadata

license: apache-2.0
language:
  - en
library_name: sentence-transformers
tags:
  - earth science
  - climate
  - biology
pipeline_tag: sentence-similarity

Model Card for nasa-smd-ibm-v0.1

nasa-smd-ibm-stv0.1 is a Bi-encoder sentence transformer model, that is adapted from nasa-smd-ibm-v0.1 encoder model. It's trained with 271 million examples along with a domain-specific dataset of 2.6 million examples from documents curated by NASA Science Mission Directorate (SMD). With this model, we aim to enhance natural language technologies like information retrieval and intelligent search as it applies to SMD NLP applications.

Model Details

Base Model: nasa-smd-ibm-v0.1
Tokenizer: Custom
Parameters: 125M
Training Strategy: Sentence Pairs, and score indicating relevancy. The model encodes the two sencence pairs indepenently and cosine similarity is calculated. the similarity is optimized using the relevance score.

Training Data

Figure:Open dataset sources for sentence transformers (271M in total)

Additionally, 2.6M abstract + title pairs collected from NASA SMD documents.

Training Procedure

Framework: PyTorch 1.9.1
sentence-transformers version: 4.30.2
Strategy: Sentence Pairs

Evaluation

Following models are evaluated:

All-MiniLM-l6-v2
BGE-base
roberta-base
A smaller version of nasa-smd-ibm_v0.1
nasa-smd-ibm-rtvr_v0.1

Uses

Information Retreival
Sentence Similarity Search

For NASA SMD related, scientific usecases.

Citation

If you find this work useful, please cite using the following bibtex citation:

Attribution

IBM Research

Masayasu Maraoka
Bishwaranjan Bhattacharjee
Aashka Trivedi

NASA SMD

Muthukumaran Ramasubramanian
Iksha Gurung
Rahul Ramachandran
Manil Maskey
Kaylin Bugbee
Mike Little
Elizabeth Fancher
Lauren Sanders
Sylvain Costes
Sergi Blanco-Cuaresma
Kelly Lockhart
Thomas Allen
Felix Grazes
Megan Ansdell
Alberto Accomazzi
Sanaz Vahidinia
Ryan McGranaghan
Armin Mehrabian
Tsendgar Lee

Disclaimer

This sentence-transformer model is currently in an experimental phase. We are working to improve the model's capabilities and performance, and as we progress, we invite the community to engage with this model, provide feedback, and contribute to its evolution.