How should Shift_to_goal_end be interpreted?

#277
by jislam - opened

Hi,

How should Shift_to_goal_end be interpreted?

Lets say I'm using model_type="CellClassifier" and perturb_type="delete", only a goal state, all genes.

  1. What does a negative "Shift_to_goal_end" mean? for Gene A, I have a significant P-value, and negative Shift_to_goal_end.
  • Does this mean that deletion of Gene A makes the cell more like its start state? (i.e. perturbed cell is very similar to start).
  1. What does a positive "Shift_to_goal_end" mean? for Gene B, I have a significant P-value, and positive Shift_to_goal_end.
  • Does this mean that deletion of Gene B makes the cell more like its goal state? (i.e. the perturbed cell is very similar to goal).
  1. what does a "Shift_to_goal_end" value close to 0 mean? or Gene C, I have a significant P-value, and 0 Shift_to_goal_end.
  • Does this mean that deletion of Gene C causes our perturbed cell to be identical to both start and goal state?
  1. Should i just think of this as "closer to 1, closer to goal state. closer to -1, closer to start state"?

Thanks for making the method so accessible!

jislam changed discussion title from What does Shift_to_goal_end mean? to how should Shift_to_goal_end be interpreted?
jislam changed discussion title from how should Shift_to_goal_end be interpreted? to How should Shift_to_goal_end be interpreted?

Thank you for your interest in Geneformer! Yes, the three states being compared are 1) the original cell, 2) the perturbed cell, and 3) the average embedding position of the provided cells within the goal state. A perturbation that yields a positive value for "Shift_to_goal_end" is one that shifts the cell from its original position to the goal embedding position. In other words, within the embedding space, (2) is between (1) and (3). Negative values indicate that the shift is away from the goal embedding position. In other words, (1) is between (2) and (3). The FDR is the multiple hypothesis-corrected p-value. Just like any statistical test, if the mean shift magnitude is large but not statistically significant, it's likely very variable, whereas a small shift can still be statistically significant if it is less variable. Generally, I would suggest setting a significance cut-off, and then ordering by magnitude. Keep in mind, all magnitudes will be relatively small given we are only modeling the first step in the perturbation (overexpressing or deleting genes without providing the model with the downstream consequences), but we are specifically targeting the direction of the predicted cosine angle shift to be towards the goal end state.

ctheodoris changed discussion status to closed

Hello! I have a question on the same topic.

In my simulations, I've noticed that many of the genes that lead to a significant shift between start state and goal state (i.e., FDR<0.05) have a negative shift_to_goal_end and positive shift_to_alt_end.
Is this configuration supposed to mean that the embedding is moving away from the goal state and towards the alternative state? Given this, is it still appropriate to assign Sig=1? Can we confidently conclude that the gene perturbation resulted in a significant shift towards the goal position despite the negative 'shift_to_goal_end' value? Or does it mean that my results are incorrect?

thank you

Sign up or log in to comment