metadata
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: fill-mask
tags:
- climate
- biology
Model Card for nasa-smd-ibm-v0.1
nasa-smd-ibm-v0.1 is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.
Model Details
- Base Model: RoBERTa
- Tokenizer: Custom
- Parameters: 125M
- Pretraining Strategy: Masked Language Modeling (MLM)
Training Data
- Wikipedia English (Feb 1, 2020)
- AGU Publications
- AMS Publications
- Scientific papers from Astrophysics Data Systems
- PubMed abstracts
- PMC (commercial license subset)
Training Procedure
- Framework: fairseq 0.12.1 with PyTorch 1.9.1
- Transformer Version: 4.2.0
- Strategy: Masked Language Modeling (MLM)
Evaluation
- BLURB Benchmark
- Pruned SQuAD2.0 (SQ2) Benchmark
- NASA SMD Experts Benchmark (WIP)
Uses
- Named Entity Recognition (NER)
- Information Retrieval
- Sentence Transformers
Citation
If you find this work useful, please cite using the following bibtex citation:
@misc {nasa-impact_2023,
author = { {NASA-IMPACT} },
title = { nasa-smd-ibm-v0.1 (Revision f01d42f) },
year = 2023,
url = { https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1 },
doi = { 10.57967/hf/1429 },
publisher = { Hugging Face }
}
Contacts
- Bishwaranjan Bhattacharjee, IBM Research
- Muthukumaran Ramasubramanian, NASA-IMPACT (mr0051@uah.edu)