Discrepancy between number of transformer layers in config and paper
#33
by
Sahiljain314
- opened
I noticed that the config.json for the SDXL UNET contains the following: https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9/blob/main/unet/config.json#L59, which indicates there is 1 transformer block at the highest resolution mapping.
However, when reading the SDXL paper, they make a bit point to mention that the actual transformer blocks are [0, 2, 10], and they have omitted any blocks at the highest level.
Am I missing something? If not, which one is correct?
Since there is no Transformer layer in DownBlock2D, the first term is ignored.
https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9/blob/025709258a55cc924dc47efd88959f18ae79830e/unet/config.json#L27