--- library_name: transformers license: apache-2.0 datasets: - billion-word-benchmark/lm1b --- ## Quick Start Guide To use this pre-trained model with the HuggingFace APIs, use the following snippet: ```python from transformers import AutoModelForMaskedLM, AutoTokenizer # See the `UDLM` collection page on the hub for list of available models. tokenizer = transformers.AutoTokenizer.from_pretrained('bert-base-uncased') model_name = 'kuleshov-group/udlm-lm1b' model = AutoModelForMaskedLM.from_pretrained(model_name) ``` ## Model Details UDLM stands for **U**niform **D**iffusion **L**anguage **M**odels. This model was trained using the refined uniform noise discrete diffusion continuous-time ELBO introduced [here](https://arxiv.org/abs/2412.10193). ### Architecture The model has a context size of 128 tokens. The model has 139M parameters. The model architecture is based off of the [Diffusion Transformer architecture](https://arxiv.org/abs/2212.09748) and consists of: - 12 multi-head attention blocks (with 12 attention heads), - hidden dimension of 768, - `adaLN` for conditioning on time-step (i.e., during diffusion training / generation). ### Training Details The model was trained using the `bert-base-uncased` tokenizer. We trained for 1M gradient update steps using a batch size of 512. We use linear warm-up with 2500 steps until we reach a constant learning rate of 3e-4. For more details, please refer to our work: [Simple Guidance Mechanisms for Discrete Diffusion Models](https://arxiv.org/abs/2412.10193). ## Citation Please cite our work using the bibtex below: ### BibTeX: ``` @article{schiff2024discreteguidance, title={Simple Guidance Mechanisms for Discrete Diffusion Models}, author={Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-torre, Hugo and de Almeida, Bernardo P and Rush, Alexander and Pierrot, Thomas and Kuleshov, Volodymyr}, journal={arXiv preprint arXiv:2412.10193}, year={2024} } ```