shahrukhx01
commited on
Commit
·
8d40725
1
Parent(s):
a4693d4
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- jxie/guacamol
|
5 |
+
- AdrianM0/MUV
|
6 |
+
library_name: transformers
|
7 |
+
---
|
8 |
+
## Model Details
|
9 |
+
|
10 |
+
We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning.
|
11 |
+
|
12 |
+
### Enumeration-aware Molecular Transformers
|
13 |
+
Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models.
|
14 |
+
#### a. Molecular Domain Adaptation (Contrastive Encoder-based)
|
15 |
+
##### i. Architecture
|
16 |
+
![smole bert drawio](https://user-images.githubusercontent.com/6007894/233776921-41667331-1ab7-413c-92f7-4e6fad512f5c.svg)
|
17 |
+
##### ii. Contrastive Learning
|
18 |
+
<img width="1418" alt="Screenshot 2023-04-22 at 11 54 23 AM" src="https://user-images.githubusercontent.com/6007894/233777069-439c18cc-77a2-4ae2-a81e-d7e94c30a6be.png">
|
19 |
+
|
20 |
+
#### b. Canonicalization Encoder-decoder (Denoising Encoder-decoder)
|
21 |
+
<img width="702" alt="Screenshot 2023-04-22 at 11 43 06 AM" src="https://user-images.githubusercontent.com/6007894/233776512-ab6cdeef-02f1-4076-9b76-b228cbf26456.png">
|
22 |
+
|
23 |
+
### Pretraining steps for this model:
|
24 |
+
|
25 |
+
- Pretrain BERT model with Masked language modeling with masked proportion set to 15% on Guacamol dataset
|