--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: fill-mask tags: - climate - biology --- # Model Card for nasa-smd-ibm-v0.1 nasa-smd-ibm-v0.1 is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search. ## Model Details - **Base Model**: RoBERTa - **Tokenizer**: Custom - **Parameters**: 125M - **Pretraining Strategy**: Masked Language Modeling (MLM) ## Training Data - Wikipedia English (Feb 1, 2020) - AGU Publications - AMS Publications - Scientific papers from Astrophysics Data Systems - PubMed abstracts - PMC (commercial license subset) ![Dataset Size Chart](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/CTNkn0WHS268hvidFmoqj.png) ## Training Procedure - **Framework**: fairseq 0.12.1 with PyTorch 1.9.1 - **Transformer Version**: 4.2.0 - **Strategy**: Masked Language Modeling (MLM) ## Evaluation - BLURB Benchmark - Pruned SQuAD2.0 (SQ2) Benchmark - NASA SMD Experts Benchmark (WIP) ![BLURB Benchmark Results](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/K0IpQnTQmrfQJ1JXxn1B6.png) ![SQ2 Benchmark Results](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/R4oMJquUz4puah3lvd5Ve.png) ## Uses - Named Entity Recognition (NER) - Information Retrieval - Sentence Transformers ## Citation If you find this work useful, please cite using the following bibtex citation: ```bibtex @misc {nasa-impact_2023, author = { {NASA-IMPACT} }, title = { nasa-smd-ibm-v0.1 (Revision f01d42f) }, year = 2023, url = { https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1 }, doi = { 10.57967/hf/1429 }, publisher = { Hugging Face } } ``` ## Contacts - Bishwaranjan Bhattacharjee, IBM Research - Muthukumaran Ramasubramanian, NASA-IMPACT (mr0051@uah.edu)