in_silico_perturber error: 'original_max_len' referenced before assignment

#182
by Pingiotto - opened

Hi Dr Theodoris, thank you for your fantastic work. I am trying to reproduce your in silico perturbation results by deleting GATA4 from the human_dcm_hcm_nf dataset using the fine-tuned CellClassifier_cardiomyopathies model you have recently uploaded.

But I am getting the following error:
File "in_silico_perturber.py", line 413, in quant_cos_sims attention_mask = gen_attention_mask(original_minibatch, original_max_len) UnboundLocalError: local variable 'original_max_len' referenced before assignment

This is because the following condition isnt being met, therefore original_max_len isnt being defined:
if (len(original_minibatch_length_set) > 1) or (max(original_minibatch_length_set) > new_max_len):

I checked the values of the relevant variables just before the conditional:
len(original_minibatch_length_set) = 1
max(original_minibatch_length_set) = 2048
new_max_len = 2048

And this is how I am calling the InSilicoPerturber class:

isp = InSilicoPerturber(perturb_type="delete",
                        perturb_rank_shift=None,
                        genes_to_perturb=["ENSG00000136574"],
                        combos=0,
                        anchor_gene=None,
                        model_type="CellClassifier",
                        num_classes=3,
                        emb_mode="cell",
                        cell_emb_style="mean_pool",
                        filter_data={"cell_type":["Cardiomyocyte1","Cardiomyocyte2","Cardiomyocyte3"]},
                        cell_states_to_model={'state_key': 'disease', 
                                              'start_state': 'dcm', 
                                              'goal_state': 'nf', 
                                              'alt_states': ['hcm']},
                        max_ncells=2000,
                        emb_layer=0,
                        forward_batch_size=70,
                        nproc=16)
isp.perturb_data("./CellClassifier_cardiomyopathies/",
                 "./human_dcm_hcm_nf.dataset",
                 "./perturb_out/",
                 "inter_")

ispstats = InSilicoPerturberStats(mode="goal_state_shift",
                                  genes_perturbed="all",
                                  combos=0,
                                  anchor_gene=None,
                                  cell_states_to_model={"disease":(["dcm"],["nf"],["hcm"])})

Any idea as to why this is happening?

Hi there – thanks for your interest in Geneformer! We've recently made a lot of changes in the code and that error slipped through. We've just fixed it, and thanks for clear explanation!

ctheodoris changed discussion status to closed

Hi David, thanks for looking into this. Your new script seems to have fixed it, but it is now getting stuck on in_silico_perturber_stats.
gene_list seems to be composed of tuples and integers, which cannot be sorted. Any idea what is causing this?

Traceback (most recent call last):
  File "/home/ubuntu/Geneformer/examples/in_silico_perturbation.py", line 39, in <module>
    ispstats.get_stats("./perturb_out/",
  File "/home/ubuntu/miniconda3/lib/python3.10/site-packages/geneformer/in_silico_perturber_stats.py", line 678, in get_stats
    gene_list = get_gene_list(dict_list, "cell")
  File "/home/ubuntu/miniconda3/lib/python3.10/site-packages/geneformer/in_silico_perturber_stats.py", line 83, in get_gene_list
    gene_list.sort()
TypeError: '<' not supported between instances of 'tuple' and 'int'
Pingiotto changed discussion status to open

Thank you for your question! Since this is a separate question from the initial one here, please open a new discussion with a title descriptive of this question. This will be helpful for others who may have the same question later. Please also include the details of the options you are using to run the stats so we can reproduce the error and help troubleshoot. Thank you!

ctheodoris changed discussion status to closed

Sign up or log in to comment