Question about Fig.2c

#118
by weilangchan - opened

微信图片_20230716183055.jpg
Thanks for your amazing work. I am trying to reproduce your work. However, I found that using only 10, 000 cells could obtain almost the same results as 100, 000 cells in Geneformer. I am wondering if there is an insufficient training problem. Could u pls share the training parameters like epochs, batch size etc. using in Fig2c?

Thank you for your interest in Geneformer! Please see the example notebook for gene classification, as well as the manuscript Methods, for the training hyperparameters. As noted in the manuscript Methods, we did not optimize hyperparameters for the downstream applications (aside from the cardiomyopathy disease classifier) in order to provide equitable comparison to alternative approaches, but optimizing hyperparameters is highly recommended and can significantly boost performance. Please also note that the x axis refers to the number of cells in the pretraining corpus, not the fine-tuning sample, which remains at 10K cells for all points in the graph. Finally, as demonstrated in Fig. 4c-d, the relevance of the cells used for fine-tuning to the given task can also highly affect performance.

ctheodoris changed discussion status to closed

Besides, if the pretraining process of 10, 000 and 100, 000 cells corpus share the same parameters?

weilangchan changed discussion status to open

Yes, the pretraining hyperparameters were the same.

ctheodoris changed discussion status to closed

Sign up or log in to comment