jannisborn's picture
update
bf24cff unverified
|
raw
history blame
3.54 kB

Model documentation & parameters

Algorithm Version: Which model version to use.

Target binding energy: The desired binding energy. The optimal range determined in literature is between -31.1 and -23.0 kcal/mol.

Primer SMILES: A SMILES string is used to prime the generation.

Maximal sequence length: The maximal number of tokens in the generated molecule.

Number of points: Number of points to sample with the Gaussian Process.

Number of steps: Number of optimization steps in the Gaussian Process optimization.

Number of samples: How many samples should be generated (between 1 and 50).

Model card -- AdvancedManufacturing

Model Details: AdvancedManufacturing is a sequence-based molecular generator tuned to generate catalysts. The model relies on a recurrent Variational Autoencoder with a binding-energy predictor trained on the latent code. The framework uses Gaussian Processes for generating targeted molecules.

Developers: Oliver Schilter and colleagues from IBM Research.

Distributors: Original authors' code integrated into GT4SD.

Model date: Not yet published. Manuscript accepted.

Model version: Different types of models trained on 7054 data points are represented either as SMILES or SELFIES. Augmentation was used to broaden the scope augmentation.

Model type: A sequence-based molecular generator tuned to generate catalysts. The model relies on a recurrent Variational Autoencoder with a binding-energy predictor trained on the latent code. The framework uses Gaussian Processes for generating targeted molecules.

Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: N.A.

Paper or other resources for more information:

License: MIT

Where to send questions or comments about the model: Open an issue on GT4SD repository.

Intended Use. Use cases that were envisioned during development: Chemical research, in particular, to discover new Suzuki cross-coupling catalysts.

Primary intended uses/users: Researchers and computational chemists using the model for research exploration purposes.

Out-of-scope use cases: Production-level inference, producing molecules with harmful properties.

Metrics: N.A.

Datasets: Data used for training was provided through the NCCR and can be found here and here.

Ethical Considerations: Unclear, please consult with original authors in case of questions.

Caveats and Recommendations: Unclear, please consult with original authors in case of questions.

Model card prototype inspired by Mitchell et al. (2019)

Citation

Please cite:

@article{manica2023accelerating,
  title={Accelerating material design with the generative toolkit for scientific discovery},
  author={Manica, Matteo and Born, Jannis and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Clarke, Dean and Teukam, Yves Gaetan Nana and Giannone, Giorgio and Hoffman, Samuel C and Buchan, Matthew and others},
  journal={npj Computational Materials},
  volume={9},
  number={1},
  pages={69},
  year={2023},
  publisher={Nature Publishing Group UK London}
}