ibm-research
/

materials.mhg-ged

Feature Extraction

Model card Files Files and versions Community

materials.mhg-ged / mhg_gnn.egg-info /PKG-INFO

ipd's picture

ipd

init

197c331 4 months ago

3.14 kB

	Metadata-Version: 2.1
	Name: mhg-gnn
	Version: 0.0
	Summary: Package for mhg-gnn
	Author: team
	License: TBD
	Classifier: Programming Language :: Python :: 3
	Classifier: Programming Language :: Python :: 3.9
	Description-Content-Type: text/markdown
	Requires-Dist: networkx>=2.8
	Requires-Dist: numpy<2.0.0,>=1.23.5
	Requires-Dist: pandas>=1.5.3
	Requires-Dist: rdkit-pypi<2023.9.6,>=2022.9.4
	Requires-Dist: torch>=2.0.0
	Requires-Dist: torchinfo>=1.8.0
	Requires-Dist: torch-geometric>=2.3.1

	# mhg-gnn

	This repository provides PyTorch source code assosiated with our publication, "MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network"

	Paper: [Arxiv Link](https://arxiv.org/pdf/2309.16374)

	For more information contact: SEIJITKD@jp.ibm.com

	![mhg-gnn](images/mhg_example1.png)

	## Introduction

	We present MHG-GNN, an autoencoder architecture
	that has an encoder based on GNN and a decoder based on a sequential model with MHG.
	Since the encoder is a GNN variant, MHG-GNN can accept any molecule as input, and
	demonstrate high predictive performance on molecular graph data.
	In addition, the decoder inherits the theoretical guarantee of MHG on always generating a structurally valid molecule as output.

	## Table of Contents

	1. [Getting Started](#getting-started)
	1. [Pretrained Models and Training Logs](#pretrained-models-and-training-logs)
	2. [Replicating Conda Environment](#replicating-conda-environment)
	2. [Feature Extraction](#feature-extraction)

	## Getting Started

	This code and environment have been tested on Intel E5-2667 CPUs at 3.30GHz and NVIDIA A100 Tensor Core GPUs.

	### Pretrained Models and Training Logs

	We provide checkpoints of the MHG-GNN model pre-trained on a dataset of ~1.34M molecules curated from PubChem. (later) For model weights: [HuggingFace Link]()

	Add the MHG-GNN `pre-trained weights.pt` to the `models/` directory according to your needs.

	### Replacicating Conda Environment

	Follow these steps to replicate our Conda environment and install the necessary libraries:

	```
	conda create --name mhg-gnn-env python=3.8.18
	conda activate mhg-gnn-env
	```

	#### Install Packages with Conda

	```
	conda install -c conda-forge networkx=2.8
	conda install numpy=1.23.5
	# conda install -c conda-forge rdkit=2022.9.4
	conda install pytorch=2.0.0 torchvision torchaudio -c pytorch
	conda install -c conda-forge torchinfo=1.8.0
	conda install pyg -c pyg
	```

	#### Install Packages with pip
	```
	pip install rdkit torch-nl==0.3 torch-scatter torch-sparse
	```

	## Feature Extraction

	The example notebook [mhg-gnn_encoder_decoder_example.ipynb](notebooks/mhg-gnn_encoder_decoder_example.ipynb) contains code to load checkpoint files and use the pre-trained model for encoder and decoder tasks.

	To load mhg-gnn, you can simply use:

	```python
	import torch
	import load

	model = load.load()
	```

	To encode SMILES into embeddings, you can use:

	```python
	with torch.no_grad():
	repr = model.encode(["CCO", "O=C=O", "OC(=O)c1ccccc1C(=O)O"])
	```

	For decoder, you can use the function, so you can return from embeddings to SMILES strings:

	```python
	orig = model.decode(repr)
	```