|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- jxie/guacamol |
|
- AdrianM0/MUV |
|
library_name: transformers |
|
--- |
|
## Model Details |
|
|
|
We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning. |
|
|
|
### Enumeration-aware Molecular Transformers |
|
Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models. |
|
#### a. Molecular Domain Adaptation (Contrastive Encoder-based) |
|
##### i. Architecture |
|
![smole bert drawio](https://user-images.githubusercontent.com/6007894/233776921-41667331-1ab7-413c-92f7-4e6fad512f5c.svg) |
|
##### ii. Contrastive Learning |
|
<img width="1418" alt="Screenshot 2023-04-22 at 11 54 23 AM" src="https://user-images.githubusercontent.com/6007894/233777069-439c18cc-77a2-4ae2-a81e-d7e94c30a6be.png"> |
|
|
|
#### b. Canonicalization Encoder-decoder (Denoising Encoder-decoder) |
|
<img width="702" alt="Screenshot 2023-04-22 at 11 43 06 AM" src="https://user-images.githubusercontent.com/6007894/233776512-ab6cdeef-02f1-4076-9b76-b228cbf26456.png"> |
|
|
|
### Pretraining steps for this model: |
|
|
|
- Pretrain BERT model with Masked language modeling with masked proportion set to 15% on Guacamol datasetFore more details please see our [github repository](https://github.com/uds-lsv/enumeration-aware-molecule-transformers). |
|
|
|
- ### Virtual Screening Benchmark ([Github Repository](https://github.com/MoleculeTransformers/rdkit-benchmarking-platform-transformers)) |
|
|
|
original version presented in |
|
S. Riniker, G. Landrum, J. Cheminf., 5, 26 (2013), |
|
DOI: 10.1186/1758-2946-5-26, |
|
URL: http://www.jcheminf.com/content/5/1/26 |
|
|
|
extended version presented in |
|
S. Riniker, N. Fechner, G. Landrum, J. Chem. Inf. Model., 53, 2829, (2013), |
|
DOI: 10.1021/ci400466r, |
|
URL: http://pubs.acs.org/doi/abs/10.1021/ci400466r |
|
|
|
## Model List |
|
|
|
Our released models are listed as following. You can import these models by using the `smiles-featurizers` package or using [HuggingFace's Transformers](https://github.com/huggingface/transformers). |
|
| Model | Type |AUROC| BEDROC| |
|
|:-------------------------------|:--------:|:--------:|:--------:| |
|
| [UdS-LSV/smole-bert](https://huggingface.co/UdS-LSV/smole-bert) | `Bert`|0.615 | 0.225 | |
|
| [UdS-LSV/smole-bert-mtr](https://huggingface.co/UdS-LSV/smole-bert-mtr) | `Bert`|0.621 | 0.262 | |
|
| [UdS-LSV/smole-bart](https://huggingface.co/UdS-LSV/smole-bart) | `Bart`|0.660 | 0.263 | |
|
| [UdS-LSV/muv2x-simcse-smole-bart](https://huggingface.co/UdS-LSV/muv2x-simcse-smole-bert) | `Simcse`|0.697 | 0.270 | |
|
| [UdS-LSV/siamese-smole-bert-muv-1x](https://huggingface.co/UdS-LSV/siamese-smole-bert-muv-1x) | `SentenceTransformer`|0.673 | 0.274 | |
|
|