Spaces:
Running
on
Zero
Downloading datasets
This directory includes instructions and scripts for downloading ImageNet and LSUN bedrooms for use in this codebase.
Class-conditional ImageNet
For our class-conditional models, we use the official ILSVRC2012 dataset with manual center cropping and downsampling. To obtain this dataset, navigate to this page on image-net.org and sign in (or create an account if you do not already have one). Then click on the link reading "Training images (Task 1 & 2)". This is a 138GB tar file containing 1000 sub-tar files, one per class.
Once the file is downloaded, extract it and look inside. You should see 1000 .tar
files. You need to extract each of these, which may be impractical to do by hand on your operating system. To automate the process on a Unix-based system, you can cd
into the directory and run this short shell script:
for file in *.tar; do tar xf "$file"; rm "$file"; done
This will extract and remove each tar file in turn.
Once all of the images have been extracted, the resulting directory should be usable as a data directory (the --data_dir
argument for the training script). The filenames should all start with WNID (class ids) followed by underscores, like n01440764_2708.JPEG
. Conveniently (but not by accident) this is how the automated data-loader expects to discover class labels.
LSUN bedroom
To download and pre-process LSUN bedroom, clone fyu/lsun on GitHub and run their download script python3 download.py bedroom
. The result will be an "lmdb" database named like bedroom_train_lmdb
. You can pass this to our lsun_bedroom.py script like so:
python lsun_bedroom.py bedroom_train_lmdb lsun_train_output_dir
This creates a directory called lsun_train_output_dir
. This directory can be passed to the training scripts via the --data_dir
argument.