U-DiT Models

This is the official U-DiT model from our work "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers". The model is trained for 400K iterations on the ImageNet 256x256 dataset.

Model Details

Model Name	FLOPs (G)	Training Iters	FID
U-DiT-S	6.04	400K	31.51
U-DiT-B	22.22	400K	16.64
U-DiT-L	85.00	400K	10.08
U-DiT-B	22.22	1M	12.87
U-DiT-L	85.00	1M	7.54

Citation

If you find this model useful, please cite:

@misc{tian2024udits,
      title={U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers}, 
      author={Yuchuan Tian and Zhijun Tu and Hanting Chen and Jie Hu and Chao Xu and Yunhe Wang},
      year={2024},
      eprint={2405.02730},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}