--- license: apache-2.0 language: - en library_name: sentence-transformers tags: - earth science - climate - biology pipeline_tag: sentence-similarity --- # Model Card for nasa-smd-ibm-v0.1 `nasa-smd-ibm-stv0.1` is a Bi-encoder sentence transformer model, that is adapted from nasa-smd-ibm-v0.1 encoder model. It's trained with 271 million examples along with a domain-specific dataset of 2.6 million examples from documents curated by NASA Science Mission Directorate (SMD). With this model, we aim to enhance natural language technologies like information retrieval and intelligent search as it applies to SMD NLP applications. ## Model Details - **Base Model**: nasa-smd-ibm-v0.1 - **Tokenizer**: Custom - **Parameters**: 125M - **Training Strategy**: Sentence Pairs, and score indicating relevancy. The model encodes the two sencence pairs indepenently and cosine similarity is calculated. the similarity is optimized using the relevance score. ## Training Data ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/ZjcHW24iKsvUYBhoL7eMM.png) Figure:Open dataset sources for sentence transformers (271M in total) Additionally, 2.6M abstract + title pairs collected from NASA SMD documents. ## Training Procedure - **Framework**: PyTorch 1.9.1 - **sentence-transformers version**: 4.30.2 - **Strategy**: Sentence Pairs ## Evaluation Following models are evaluated: 1. All-MiniLM-l6-v2 2. BGE-base 3. roberta-base 4. A smaller version of nasa-smd-ibm_v0.1 5. nasa-smd-ibm-rtvr_v0.1 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/zbeBb8mh0wgdTu1Gc-j1I.png) ## Uses - Information Retreival - Sentence Similarity Search For NASA SMD related, scientific usecases. ## Citation If you find this work useful, please cite using the following bibtex citation: ```bibtex ``` ## Attribution IBM Research - Masayasu Maraoka - Bishwaranjan Bhattacharjee - Aashka Trivedi NASA SMD - Muthukumaran Ramasubramanian - Iksha Gurung - Rahul Ramachandran - Manil Maskey - Kaylin Bugbee - Mike Little - Elizabeth Fancher - Lauren Sanders - Sylvain Costes - Sergi Blanco-Cuaresma - Kelly Lockhart - Thomas Allen - Felix Grazes - Megan Ansdell - Alberto Accomazzi - Sanaz Vahidinia - Ryan McGranaghan - Armin Mehrabian - Tsendgar Lee ## Disclaimer This sentence-transformer model is currently in an experimental phase. We are working to improve the model's capabilities and performance, and as we progress, we invite the community to engage with this model, provide feedback, and contribute to its evolution.