recalculate gene_median_dictionary for continual training on human organoids datasets?

#464
by Jayce77 - opened

Hi, thanks for the wonderful work!
I plan to do continual training on the 95M model using a 1.7 million human organoid dataset. It seems that the gene_median_95M you provided is not based on human organoid data. Do you think I need to recalculate the gene median before the continual training? Appreciate if you have some suggestions on this. Thank you!

Thanks for your question! We recommend to keep the same median dictionary as the original model so that the scaling is the same as the pretraining and also so that the medians are derived from a larger diversity of cells. (Also, organoid data was not excluded from the pretraining corpus).

ctheodoris changed discussion status to closed

Sign up or log in to comment