torchdrug / model_cards /article.md
jannisborn's picture
Duplicate from jannisborn/gt4sd-moler
7d76d6f
|
raw
history blame
3.18 kB

Model documentation & parameters

Algorithm Version: Which model checkpoint to use (trained on different datasets).

Scaffolds: One or multiple scaffolds (or seed molecules), provided as '.'-separated SMILES. If empty, no scaffolds are used.

Number of samples: How many samples should be generated (between 1 and 50).

Beam size: Beam size used in beam search decoding (the higher the slower but better).

Seed: The random seed used for initialization.

Model card

Model Details: MoLeR is a graph-based molecular generative model that can be conditioned (primed) on scaffolds. The model decorates scaffolds with realistic structural motifs.

Developers: Krzysztof Maziarz and co-authors from Microsoft Research and Novartis (full reference at bottom).

Distributors: Developer's code wrapped and distributed by GT4SD Team (2023) from IBM Research.

Model date: Released around March 2022.

Model version: Model provided by original authors, see their GitHub repo.

Model type: An encoder-decoder-based GNN for molecular generation.

Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: Trained by the original authors with the default parameters provided on GitHub.

Paper or other resource for more information: Learning to Extend Molecular Scaffolds with Structural Motifs (ICLR 2022).

License: MIT

Where to send questions or comments about the model: Open an issue on original author's GitHub repository.

Intended Use. Use cases that were envisioned during development: Chemical research, in particular drug discovery.

Primary intended uses/users: Researchers and computational chemists using the model for model comparison or research exploration purposes.

Out-of-scope use cases: Production-level inference, producing molecules with harmful properties.

Factors: Not applicable.

Metrics: Validation loss on decoding correct molecules. Evaluated on several downstream tasks.

Datasets: 1.5M drug-like molecules from GuacaMol benchmark. Finetuning on 20 molecular optimization tasks from GuacaMol.

Ethical Considerations: Unclear, please consult with original authors in case of questions.

Caveats and Recommendations: Unclear, please consult with original authors in case of questions.

Model card prototype inspired by Mitchell et al. (2019)

Citation

@inproceedings{maziarz2021learning,
  author={Krzysztof Maziarz and Henry Richard Jackson{-}Flux and Pashmina Cameron and
    Finton Sirockin and Nadine Schneider and Nikolaus Stiefl and Marwin H. S. Segler and Marc Brockschmidt},
  title     = {Learning to Extend Molecular Scaffolds with Structural Motifs},
  booktitle = {The Tenth International Conference on Learning Representations, {ICLR}},
  year      = {2022}
}