Need advice getting started

#348
by cstrlln - opened

Thanks for this tool, looks will be super useful for some of our projects.
Disclaimer, I'm an absolute beginner in using machine learning models. I think I can follow most of the code but wanted to better understand the workflow and pardon if some of the concepts I'm missing are obvious.

I have a dataset I want to use to fine-tune the geneformer model in order to use the classification and perturbation tools from geneformer, but I'm not completely sure at what step is the fine-tuning actually happening, is it with the geneformer.Classifier? Will this generate my fine-tuned model and save it for future applications?What are the steps to follow in broad terms?

In particular, in my dataset, I have cells from different maturation stages and I would like to know which genes are more important to getting into the later stages (ala velocity or pseudotime, I guess) or what happens with certain clusters with perturbation when disrupting or over expressing certain genes.

I'm looking for a point to start, it seems that tokenizer, classifier, perturbation is the sequence to follow?
I'm trying to read the different examples and noticed in discussions that sometimes the hyperparam_optimiz_for_disease_classifier.py is recommended, where does this fit in the workflow? Also, noticed that the examples are different between the main and pr_146 branches, with the latter being more expanded. For starting which ones do you recommend following. If you could provide a simple outline of how to go about this I would really appreciate it.

Thank you!

Thank you for your questions and interest in Geneformer! Since your ultimate goal is to understand in silico perturbations that shift between cell states, the recommended process would be:

  1. tokenize your single cell
  2. fine-tune the model to distinguish your cell states of interest
  3. perform in silico perturbation to determine gene perturbations that shift between your various start and end goal states.

The pr_146 branch and hyperparam_optimiz_for_disease_classifier.py are from prior to our consolidation of the classification process into a module for ease of use so you should follow the examples on the current repository.

For additional documentation, please see https://geneformer.readthedocs.io/en/latest/

ctheodoris changed discussion status to closed

Sign up or log in to comment