|
Metadata-Version: 2.1 |
|
Name: mhg-gnn |
|
Version: 0.0 |
|
Summary: Package for mhg-gnn |
|
Author: team |
|
License: TBD |
|
Classifier: Programming Language :: Python :: 3 |
|
Classifier: Programming Language :: Python :: 3.9 |
|
Description-Content-Type: text/markdown |
|
Requires-Dist: networkx>=2.8 |
|
Requires-Dist: numpy<2.0.0,>=1.23.5 |
|
Requires-Dist: pandas>=1.5.3 |
|
Requires-Dist: rdkit-pypi<2023.9.6,>=2022.9.4 |
|
Requires-Dist: torch>=2.0.0 |
|
Requires-Dist: torchinfo>=1.8.0 |
|
Requires-Dist: torch-geometric>=2.3.1 |
|
|
|
# mhg-gnn |
|
|
|
This repository provides PyTorch source code assosiated with our publication, "MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network" |
|
|
|
**Paper:** [Arxiv Link](https://arxiv.org/pdf/2309.16374) |
|
|
|
For more information contact: SEIJITKD@jp.ibm.com |
|
|
|
![mhg-gnn](images/mhg_example1.png) |
|
|
|
## Introduction |
|
|
|
We present MHG-GNN, an autoencoder architecture |
|
that has an encoder based on GNN and a decoder based on a sequential model with MHG. |
|
Since the encoder is a GNN variant, MHG-GNN can accept any molecule as input, and |
|
demonstrate high predictive performance on molecular graph data. |
|
In addition, the decoder inherits the theoretical guarantee of MHG on always generating a structurally valid molecule as output. |
|
|
|
## Table of Contents |
|
|
|
1. [Getting Started](#getting-started) |
|
1. [Pretrained Models and Training Logs](#pretrained-models-and-training-logs) |
|
2. [Replicating Conda Environment](#replicating-conda-environment) |
|
2. [Feature Extraction](#feature-extraction) |
|
|
|
## Getting Started |
|
|
|
**This code and environment have been tested on Intel E5-2667 CPUs at 3.30GHz and NVIDIA A100 Tensor Core GPUs.** |
|
|
|
### Pretrained Models and Training Logs |
|
|
|
We provide checkpoints of the MHG-GNN model pre-trained on a dataset of ~1.34M molecules curated from PubChem. (later) For model weights: [HuggingFace Link]() |
|
|
|
Add the MHG-GNN `pre-trained weights.pt` to the `models/` directory according to your needs. |
|
|
|
### Replacicating Conda Environment |
|
|
|
Follow these steps to replicate our Conda environment and install the necessary libraries: |
|
|
|
``` |
|
conda create |
|
conda activate mhg-gnn-env |
|
``` |
|
|
|
#### Install Packages with Conda |
|
|
|
``` |
|
conda install -c conda-forge networkx=2.8 |
|
conda install numpy=1.23.5 |
|
# conda install -c conda-forge rdkit=2022.9.4 |
|
conda install pytorch=2.0.0 torchvision torchaudio -c pytorch |
|
conda install -c conda-forge torchinfo=1.8.0 |
|
conda install pyg -c pyg |
|
``` |
|
|
|
#### Install Packages with pip |
|
``` |
|
pip install rdkit torch-nl==0.3 torch-scatter torch-sparse |
|
``` |
|
|
|
## Feature Extraction |
|
|
|
The example notebook [mhg-gnn_encoder_decoder_example.ipynb](notebooks/mhg-gnn_encoder_decoder_example.ipynb) contains code to load checkpoint files and use the pre-trained model for encoder and decoder tasks. |
|
|
|
To load mhg-gnn, you can simply use: |
|
|
|
```python |
|
import torch |
|
import load |
|
|
|
model = load.load() |
|
``` |
|
|
|
To encode SMILES into embeddings, you can use: |
|
|
|
```python |
|
with torch.no_grad(): |
|
repr = model.encode(["CCO", "O=C=O", "OC(=O)c1ccccc1C(=O)O"]) |
|
``` |
|
|
|
For decoder, you can use the function, so you can return from embeddings to SMILES strings: |
|
|
|
```python |
|
orig = model.decode(repr) |
|
``` |
|
|