# Model documentation & parameters

**Algorithm Version**: Which model version to use.

**Target binding energy**: The desired binding energy. The optimal range determined in [literature](https://doi.org/10.1039/C8SC01949E) is between -31.1 and -23.0 kcal/mol.

**Primer SMILES**: A SMILES string is used to prime the generation.

**Maximal sequence length**: The maximal number of tokens in the generated molecule.

**Number of points**: Number of points to sample with the Gaussian Process.

**Number of steps**: Number of optimization steps in the Gaussian Process optimization.

**Number of samples**: How many samples should be generated (between 1 and 50).


# Model card -- AdvancedManufacturing

**Model Details**: *AdvancedManufacturing* is a sequence-based molecular generator tuned to generate catalysts. The model relies on a recurrent Variational Autoencoder with a binding-energy predictor trained on the latent code. The framework uses Gaussian Processes for generating targeted molecules.

**Developers**: Oliver Schilter and colleagues from IBM Research.

**Distributors**: Original authors' code integrated into GT4SD.

**Model date**: Not yet published. Manuscript accepted.

**Model version**: Different types of models trained on 7054 data points are represented either as SMILES or SELFIES. Augmentation was used to broaden the scope augmentation.

**Model type**: A sequence-based molecular generator tuned to generate catalysts. The model relies on a recurrent Variational Autoencoder with a binding-energy predictor trained on the latent code. The framework uses Gaussian Processes for generating targeted molecules.

**Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**: 
N.A.

**Paper or other resources for more information**: 


**License**: MIT

**Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).

**Intended Use. Use cases that were envisioned during development**: Chemical research, in particular, to discover new Suzuki cross-coupling catalysts.

**Primary intended uses/users**: Researchers and computational chemists using the model for research exploration purposes.

**Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.

**Metrics**: N.A.

**Datasets**: Data used for training was provided through the NCCR and can be found [here](https://doi.org/10.24435/materialscloud:2018.0014/v1) and [here](https://doi.org/10.24435/materialscloud:2019.0007/v3).

**Ethical Considerations**: Unclear, please consult with original authors in case of questions.

**Caveats and Recommendations**: Unclear, please consult with original authors in case of questions.

Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)

## Citation
Please cite:
```bib
@article{manica2023accelerating,
  title={Accelerating material design with the generative toolkit for scientific discovery},
  author={Manica, Matteo and Born, Jannis and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Clarke, Dean and Teukam, Yves Gaetan Nana and Giannone, Giorgio and Hoffman, Samuel C and Buchan, Matthew and others},
  journal={npj Computational Materials},
  volume={9},
  number={1},
  pages={69},
  year={2023},
  publisher={Nature Publishing Group UK London}
}
```