earthflow/DOFA · Training data / resources

I'd like to understand more about how much data DOFA was pretrained with, on what computational resources, and for how long.

How many datasets was it pretrained on?
How big was each dataset?
What model / how many GPUs was it trained on?
How long was is trained for?

In the paper the information related to this I see is:

To reduce the computational cost of self-supervised training on extensive datasets, we design a continual pretraining strategy inspired by Mendieta et al. [20] incorporating a distillation loss and a weight initialization strategy. This method effectively utilizes knowledge from expansive, supervised, pretrained models, reducing the computational burden and associated CO2 emissions

Looking at [20] I see:

8 NVIDIA V100 GPUs with a batch size of 2048 (128 per GPU) and the image size of 192×192. And the GFM variant uses 93 hours.

Is that the same / similar to what the DOFA weights used? Any more details about the hardware and training procedure that was used would be helpful.