Training data / resources

#2
by erotemic - opened

I'd like to understand more about how much data DOFA was pretrained with, on what computational resources, and for how long.

  • How many datasets was it pretrained on?
  • How big was each dataset?
  • What model / how many GPUs was it trained on?
  • How long was is trained for?

In the paper the information related to this I see is:

To reduce the computational cost of self-supervised training on extensive datasets, we design a continual pretraining strategy inspired by Mendieta et al. [20] incorporating a distillation loss and a weight initialization strategy. This method effectively utilizes knowledge from expansive, supervised, pretrained models, reducing the computational burden and associated CO2 emissions

Looking at [20] I see:

8 NVIDIA V100 GPUs with a batch size of 2048 (128 per GPU) and the image size of 192×192. And the GFM variant uses 93 hours.

Is that the same / similar to what the DOFA weights used? Any more details about the hardware and training procedure that was used would be helpful.

Sign up or log in to comment