Fill-Mask
Transformers
PyTorch
English
roberta
earth science
climate
biology
Inference Endpoints
nasa-smd-ibm-v0.1 / README.md
Muthukumaran's picture
Update README.md
d9915fd
|
raw
history blame
2.07 kB
metadata
license: apache-2.0
language:
  - en
library_name: transformers
pipeline_tag: fill-mask
tags:
  - climate
  - biology

Model Card for nasa-smd-ibm-v0.1

nasa-smd-ibm-v0.1 is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.

Model Details

  • Base Model: RoBERTa
  • Tokenizer: Custom
  • Parameters: 125M
  • Pretraining Strategy: Masked Language Modeling (MLM)

Training Data

  • Wikipedia English (Feb 1, 2020)
  • AGU Publications
  • AMS Publications
  • Scientific papers from Astrophysics Data Systems
  • PubMed abstracts
  • PMC (commercial license subset)

Dataset Size Chart

Training Procedure

  • Framework: fairseq 0.12.1 with PyTorch 1.9.1
  • Transformer Version: 4.2.0
  • Strategy: Masked Language Modeling (MLM)

Evaluation

  • BLURB Benchmark
  • Pruned SQuAD2.0 (SQ2) Benchmark
  • NASA SMD Experts Benchmark (WIP)

BLURB Benchmark Results SQ2 Benchmark Results

Uses

  • Named Entity Recognition (NER)
  • Information Retrieval
  • Sentence Transformers

Citation

If you find this work useful, please cite using the following bibtex citation:

@misc {nasa-impact_2023,
    author       = { {NASA-IMPACT} },
    title        = { nasa-smd-ibm-v0.1 (Revision f01d42f) },
    year         = 2023,
    url          = { https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1 },
    doi          = { 10.57967/hf/1429 },
    publisher    = { Hugging Face }
}

Contacts

  • Bishwaranjan Bhattacharjee, IBM Research
  • Muthukumaran Ramasubramanian, NASA-IMPACT (mr0051@uah.edu)