UdS-LSV
/

smole-bert

Inference Endpoints

Model card Files Files and versions Community

smole-bert / README.md

shahrukhx01's picture

Update README.md

fdfa1dd about 1 year ago

|

history blame contribute delete

2.86 kB

	---
	license: apache-2.0
	datasets:
	- jxie/guacamol
	- AdrianM0/MUV
	library_name: transformers
	---
	## Model Details

	We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning.

	### Enumeration-aware Molecular Transformers
	Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models.
	#### a. Molecular Domain Adaptation (Contrastive Encoder-based)
	##### i. Architecture
	![smole bert drawio](https://user-images.githubusercontent.com/6007894/233776921-41667331-1ab7-413c-92f7-4e6fad512f5c.svg)
	##### ii. Contrastive Learning
	<img width="1418" alt="Screenshot 2023-04-22 at 11 54 23 AM" src="https://user-images.githubusercontent.com/6007894/233777069-439c18cc-77a2-4ae2-a81e-d7e94c30a6be.png">

	#### b. Canonicalization Encoder-decoder (Denoising Encoder-decoder)
	<img width="702" alt="Screenshot 2023-04-22 at 11 43 06 AM" src="https://user-images.githubusercontent.com/6007894/233776512-ab6cdeef-02f1-4076-9b76-b228cbf26456.png">

	### Pretraining steps for this model:

	- Pretrain BERT model with Masked language modeling with masked proportion set to 15% on Guacamol datasetFore more details please see our [github repository](https://github.com/uds-lsv/enumeration-aware-molecule-transformers).

	- ### Virtual Screening Benchmark ([Github Repository](https://github.com/MoleculeTransformers/rdkit-benchmarking-platform-transformers))

	original version presented in
	S. Riniker, G. Landrum, J. Cheminf., 5, 26 (2013),
	DOI: 10.1186/1758-2946-5-26,
	URL: http://www.jcheminf.com/content/5/1/26

	extended version presented in
	S. Riniker, N. Fechner, G. Landrum, J. Chem. Inf. Model., 53, 2829, (2013),
	DOI: 10.1021/ci400466r,
	URL: http://pubs.acs.org/doi/abs/10.1021/ci400466r

	## Model List

	Our released models are listed as following. You can import these models by using the `smiles-featurizers` package or using [HuggingFace's Transformers](https://github.com/huggingface/transformers).
	\| Model \| Type \|AUROC\| BEDROC\|
	\|:-------------------------------\|:--------:\|:--------:\|:--------:\|
	\| [UdS-LSV/smole-bert](https://huggingface.co/UdS-LSV/smole-bert) \| `Bert`\|0.615 \| 0.225 \|
	\| [UdS-LSV/smole-bert-mtr](https://huggingface.co/UdS-LSV/smole-bert-mtr) \| `Bert`\|0.621 \| 0.262 \|
	\| [UdS-LSV/smole-bart](https://huggingface.co/UdS-LSV/smole-bart) \| `Bart`\|0.660 \| 0.263 \|
	\| [UdS-LSV/muv2x-simcse-smole-bart](https://huggingface.co/UdS-LSV/muv2x-simcse-smole-bert) \| `Simcse`\|0.697 \| 0.270 \|
	\| [UdS-LSV/siamese-smole-bert-muv-1x](https://huggingface.co/UdS-LSV/siamese-smole-bert-muv-1x) \| `SentenceTransformer`\|0.673 \| 0.274 \|