vumichien's picture
Set `library_name` to `tf-keras`. (#3)
8128342 verified
|
raw
history blame
2.57 kB
metadata
library_name: tf-keras

Model description

This repo contains the model and the notebook for fine-tuning BERT model on SNLI Corpus for Semantic Similarity. Drug Molecule Generation with VAE.

Full credits go to Victor Basu

Reproduced by Vu Minh Chien

Motivation: Using a Variational Autoencoder to generate molecules for drug discovery. Automatic chemical design using a data-driven continuous representation of molecules generates new molecules via efficient exploration of open-ended spaces of chemical compounds. The model consists of three components: Encoder, Decoder, and Predictor. The Encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the Decoder converts these continuous vectors back to discrete molecule representations. The Predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations allow the use of gradient-based optimization to efficiently guide the search for optimized functional compounds.

intro

Intended uses & limitations

In this example, RDKit is used to conveniently and efficiently transform SMILES into molecule objects, and then from those obtain sets of atoms and bonds. SMILES expresses the structure of a given molecule in the form of an ASCII string. The SMILES string is a compact encoding that, for smaller molecules, is relatively human-readable. Encoding molecules as a string both alleviates and facilitates database and/or web searching of a given molecule. RDKit uses algorithms to accurately transform a given SMILES to a molecule object, which can then be used to compute a great number of molecular properties/features.

Training and evaluation data

The ZINC – A Free Database of Commercially Available Compounds for Virtual Screening dataset was used in this tutorial. The dataset comes with molecule formula in SMILE representation along with their respective molecular properties such as logP (water–octanal partition coefficient), SAS (synthetic accessibility score), and QED (Qualitative Estimate of Drug-likeness).

Model Plot

View Model Plot

Model Image

Output samples

Latent spaces samples

Latent spaces

View samples

Samples