shahrukhx01 commited on
Commit
8d40725
·
1 Parent(s): a4693d4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - jxie/guacamol
5
+ - AdrianM0/MUV
6
+ library_name: transformers
7
+ ---
8
+ ## Model Details
9
+
10
+ We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning.
11
+
12
+ ### Enumeration-aware Molecular Transformers
13
+ Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models.
14
+ #### a. Molecular Domain Adaptation (Contrastive Encoder-based)
15
+ ##### i. Architecture
16
+ ![smole bert drawio](https://user-images.githubusercontent.com/6007894/233776921-41667331-1ab7-413c-92f7-4e6fad512f5c.svg)
17
+ ##### ii. Contrastive Learning
18
+ <img width="1418" alt="Screenshot 2023-04-22 at 11 54 23 AM" src="https://user-images.githubusercontent.com/6007894/233777069-439c18cc-77a2-4ae2-a81e-d7e94c30a6be.png">
19
+
20
+ #### b. Canonicalization Encoder-decoder (Denoising Encoder-decoder)
21
+ <img width="702" alt="Screenshot 2023-04-22 at 11 43 06 AM" src="https://user-images.githubusercontent.com/6007894/233776512-ab6cdeef-02f1-4076-9b76-b228cbf26456.png">
22
+
23
+ ### Pretraining steps for this model:
24
+
25
+ - Pretrain BERT model with Masked language modeling with masked proportion set to 15% on Guacamol dataset