Muthukumaran commited on
Commit
08ac2b4
1 Parent(s): edb0ca5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -1,3 +1,91 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: sentence-transformers
6
+ tags:
7
+ - earth science
8
+ - climate
9
+ - biology
10
+ pipeline_tag: sentence-similarity
11
  ---
12
+
13
+ # Model Card for nasa-smd-ibm-v0.1
14
+
15
+ `nasa-smd-ibm-stv0.1` is a Bi-encoder sentence transformer model, that is adapted from nasa-smd-ibm-v0.1 encoder model. It's trained with 271 million examples along with a domain-specific dataset of 2.6 million examples from documents curated by NASA Science Mission Directorate (SMD). With this model, we aim to enhance natural language technologies like information retrieval and intelligent search as it applies to SMD NLP applications.
16
+
17
+ ## Model Details
18
+ - **Base Model**: nasa-smd-ibm-v0.1
19
+ - **Tokenizer**: Custom
20
+ - **Parameters**: 125M
21
+ - **Training Strategy**: Sentence Pairs, and score indicating relevancy. The model encodes the two sencence pairs indepenently and cosine similarity is calculated. the similarity is optimized using the relevance score.
22
+
23
+ ## Training Data
24
+
25
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/ZjcHW24iKsvUYBhoL7eMM.png)
26
+ Figure:Open dataset sources for sentence transformers (271M in total)
27
+
28
+ Additionally, 2.6M abstract + title pairs collected from NASA SMD documents.
29
+
30
+
31
+ ## Training Procedure
32
+ - **Framework**: PyTorch 1.9.1
33
+ - **sentence-transformers version**: 4.30.2
34
+ - **Strategy**: Sentence Pairs
35
+
36
+ ## Evaluation
37
+ Following models are evaluated:
38
+
39
+ 1. All-MiniLM-l6-v2
40
+ 2. BGE-base
41
+ 3. roberta-base
42
+ 4. A smaller version of nasa-smd-ibm_v0.1
43
+ 5. nasa-smd-ibm-rtvr_v0.1
44
+
45
+
46
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/zbeBb8mh0wgdTu1Gc-j1I.png)
47
+
48
+ ## Uses
49
+ - Information Retreival
50
+ - Sentence Similarity Search
51
+
52
+ For NASA SMD related, scientific usecases.
53
+
54
+ ## Citation
55
+ If you find this work useful, please cite using the following bibtex citation:
56
+
57
+ ```bibtex
58
+
59
+ ```
60
+
61
+ ## Attribution
62
+
63
+ IBM Research
64
+ - Masayasu Maraoka
65
+ - Bishwaranjan Bhattacharjee
66
+ - Aashka Trivedi
67
+
68
+ NASA SMD
69
+ - Muthukumaran Ramasubramanian
70
+ - Iksha Gurung
71
+ - Rahul Ramachandran
72
+ - Manil Maskey
73
+ - Kaylin Bugbee
74
+ - Mike Little
75
+ - Elizabeth Fancher
76
+ - Lauren Sanders
77
+ - Sylvain Costes
78
+ - Sergi Blanco-Cuaresma
79
+ - Kelly Lockhart
80
+ - Thomas Allen
81
+ - Felix Grazes
82
+ - Megan Ansdell
83
+ - Alberto Accomazzi
84
+ - Sanaz Vahidinia
85
+ - Ryan McGranaghan
86
+ - Armin Mehrabian
87
+ - Tsendgar Lee
88
+
89
+ ## Disclaimer
90
+
91
+ This sentence-transformer model is currently in an experimental phase. We are working to improve the model's capabilities and performance, and as we progress, we invite the community to engage with this model, provide feedback, and contribute to its evolution.