Transformers
PyTorch
bert
Inference Endpoints

Model Details

We introduce a suite of neural language model tools for pre-training, fine-tuning SMILES-based molecular language models. Furthermore, we also provide recipes for semi-supervised recipes for fine-tuning these languages in low-data settings using Semi-supervised learning.

Enumeration-aware Molecular Transformers

Introduces contrastive learning alongside multi-task regression, and masked language modelling as pre-training objectives to inject enumeration knowledge into pre-trained language models.

a. Molecular Domain Adaptation (Contrastive Encoder-based)

i. Architecture

smole bert drawio

ii. Contrastive Learning
Screenshot 2023-04-22 at 11 54 23 AM

b. Canonicalization Encoder-decoder (Denoising Encoder-decoder)

Screenshot 2023-04-22 at 11 43 06 AM

Pretraining steps for this model:

  • Pretrain BERT model with Multi task regression on physicochemical properties on Guacamol dataset
  • Domain adaptation on MUV dataset with Constrastive Learning, Masked Language Modeling

Fore more details please see our github repository.

Downloads last month
10
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Datasets used to train UdS-LSV/simcse-smole-bert-muv-mlm