|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: fill-mask |
|
tags: |
|
- climate |
|
- biology |
|
--- |
|
|
|
# Model Card for nasa-smd-ibm-v0.1 |
|
|
|
nasa-smd-ibm-v0.1 is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search. |
|
|
|
## Model Details |
|
- **Base Model**: RoBERTa |
|
- **Tokenizer**: Custom |
|
- **Parameters**: 125M |
|
- **Pretraining Strategy**: Masked Language Modeling (MLM) |
|
|
|
## Training Data |
|
- Wikipedia English (Feb 1, 2020) |
|
- AGU Publications |
|
- AMS Publications |
|
- Scientific papers from Astrophysics Data Systems |
|
- PubMed abstracts |
|
- PMC (commercial license subset) |
|
|
|
![Dataset Size Chart](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/CTNkn0WHS268hvidFmoqj.png) |
|
|
|
## Training Procedure |
|
- **Framework**: fairseq 0.12.1 with PyTorch 1.9.1 |
|
- **Transformer Version**: 4.2.0 |
|
- **Strategy**: Masked Language Modeling (MLM) |
|
|
|
## Evaluation |
|
- BLURB Benchmark |
|
- Pruned SQuAD2.0 (SQ2) Benchmark |
|
- NASA SMD Experts Benchmark (WIP) |
|
|
|
![BLURB Benchmark Results](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/K0IpQnTQmrfQJ1JXxn1B6.png) |
|
![SQ2 Benchmark Results](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/R4oMJquUz4puah3lvd5Ve.png) |
|
|
|
## Uses |
|
- Named Entity Recognition (NER) |
|
- Information Retrieval |
|
- Sentence Transformers |
|
|
|
## Citation |
|
If you find this work useful, please cite using the following bibtex citation: |
|
```bibtex |
|
@misc {nasa-impact_2023, |
|
author = { {NASA-IMPACT} }, |
|
title = { nasa-smd-ibm-v0.1 (Revision f01d42f) }, |
|
year = 2023, |
|
url = { https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1 }, |
|
doi = { 10.57967/hf/1429 }, |
|
publisher = { Hugging Face } |
|
} |
|
``` |
|
## Contacts |
|
- Bishwaranjan Bhattacharjee, IBM Research |
|
- Muthukumaran Ramasubramanian, NASA-IMPACT (mr0051@uah.edu) |
|
|