File size: 2,861 Bytes
8d40725 fdfa1dd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
license: apache-2.0
datasets:
- jxie/guacamol
- AdrianM0/MUV
library_name: transformers
---
## Model Details
We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning.
### Enumeration-aware Molecular Transformers
Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models.
#### a. Molecular Domain Adaptation (Contrastive Encoder-based)
##### i. Architecture
![smole bert drawio](https://user-images.githubusercontent.com/6007894/233776921-41667331-1ab7-413c-92f7-4e6fad512f5c.svg)
##### ii. Contrastive Learning
<img width="1418" alt="Screenshot 2023-04-22 at 11 54 23 AM" src="https://user-images.githubusercontent.com/6007894/233777069-439c18cc-77a2-4ae2-a81e-d7e94c30a6be.png">
#### b. Canonicalization Encoder-decoder (Denoising Encoder-decoder)
<img width="702" alt="Screenshot 2023-04-22 at 11 43 06 AM" src="https://user-images.githubusercontent.com/6007894/233776512-ab6cdeef-02f1-4076-9b76-b228cbf26456.png">
### Pretraining steps for this model:
- Pretrain BERT model with Masked language modeling with masked proportion set to 15% on Guacamol datasetFore more details please see our [github repository](https://github.com/uds-lsv/enumeration-aware-molecule-transformers).
- ### Virtual Screening Benchmark ([Github Repository](https://github.com/MoleculeTransformers/rdkit-benchmarking-platform-transformers))
original version presented in
S. Riniker, G. Landrum, J. Cheminf., 5, 26 (2013),
DOI: 10.1186/1758-2946-5-26,
URL: http://www.jcheminf.com/content/5/1/26
extended version presented in
S. Riniker, N. Fechner, G. Landrum, J. Chem. Inf. Model., 53, 2829, (2013),
DOI: 10.1021/ci400466r,
URL: http://pubs.acs.org/doi/abs/10.1021/ci400466r
## Model List
Our released models are listed as following. You can import these models by using the `smiles-featurizers` package or using [HuggingFace's Transformers](https://github.com/huggingface/transformers).
| Model | Type |AUROC| BEDROC|
|:-------------------------------|:--------:|:--------:|:--------:|
| [UdS-LSV/smole-bert](https://huggingface.co/UdS-LSV/smole-bert) | `Bert`|0.615 | 0.225 |
| [UdS-LSV/smole-bert-mtr](https://huggingface.co/UdS-LSV/smole-bert-mtr) | `Bert`|0.621 | 0.262 |
| [UdS-LSV/smole-bart](https://huggingface.co/UdS-LSV/smole-bart) | `Bart`|0.660 | 0.263 |
| [UdS-LSV/muv2x-simcse-smole-bart](https://huggingface.co/UdS-LSV/muv2x-simcse-smole-bert) | `Simcse`|0.697 | 0.270 |
| [UdS-LSV/siamese-smole-bert-muv-1x](https://huggingface.co/UdS-LSV/siamese-smole-bert-muv-1x) | `SentenceTransformer`|0.673 | 0.274 |
|