Fill-Mask
Transformers
Safetensors
udlm
custom_code
udlm-qm9 / README.md
yairschiff's picture
Update README.md
397b9bd verified
metadata
library_name: transformers
license: apache-2.0
datasets:
  - yairschiff/qm9

Quick Start Guide

To use this pre-trained model with the HuggingFace APIs, use the following snippet:

from transformers import AutoModelForMaskedLM, AutoTokenizer

# See the `UDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('yairschiff/qm9-tokenizer')
model_name = 'kuleshov-group/udlm-qm9'
model = AutoModelForMaskedLM.from_pretrained(model_name)

Model Details

UDLM stands for Uniform Diffusion Language Models. This model was trained using the refined uniform noise discrete diffusion continuous-time ELBO introduced here.

Architecture

The model has a context size of 32 tokens. The model has 92M parameters.

The model architecture is based off of the Diffusion Transformer architecture and consists of:

  • 12 multi-head attention blocks (with 12 attention heads),
  • hidden dimension of 768,
  • adaLN for conditioning on time-step (i.e., during diffusion training / generation).

Training Details

The model was trained using the yairschiff/qm9-tokenizer tokenizer, a custom tokenizer for parsing SMILES strings. We trained for 25k gradient update steps using a batch size of 2,048. We used linear warm-up with 1,000 steps until we reach a learning rate of 3e-4 and the applied cosine-decay until reaching a minimum learning rate of 3e-6.

For more details, please refer to our work: Simple Guidance Mechanisms for Discrete Diffusion Models.

Citation

Please cite our work using the bibtex below:

BibTeX:

@article{schiff2024discreteguidance,
  title={Simple Guidance Mechanisms for Discrete Diffusion Models},          
  author={Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-torre, Hugo and de Almeida, Bernardo P and Rush, Alexander and Pierrot, Thomas and Kuleshov, Volodymyr},
  journal={arXiv preprint arXiv:2412.10193},
  year={2024}
}