How many total images in the final CAViT-S training dataset (RxRx3 + JUMP-CRISPR/ORF)?

#18
by spud123 - opened

I'm doing some micro-benchmarks, and I was wondering how many images were actually used to train the CAViT-S here? RxRx3 is ~2.2 million images and JUMP-CRISPR/ORF seems to be around 1.5 million images. However, cpg0016 has increased over the last few years.

With that said, I don't want to note down an overestimate of the # images to downplay your model performance if the total images used for training was lower though. If you could provide the exact amount of images used for training, that would be greatly appreciated.

Thanks! And cool model!

Recursion Pharmaceuticals org

Thanks @spud123 - there were 3,317,455 samples in the training set. If assessing model performance, be sure to consider the control-alignment strategy as we indicated in the documentation (fit a PCA on all the controls and the standard-scaler each batch separately based on their controls). As well, it could be worthwhile to consider the channel-specific representations produced by the model within the 6 different decoder layers.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment