Error when running in_silico_perturber example

#293
by nlapier2 - opened

Hello,

I am trying to run the in_silico_perturber example, but I'm running into an error near the end when creating the InSilicoPerturberStats object. I have created a slight modification of the examples/in_silico_perturbation.ipynb script. Below I've put the code I'm running, the intermediate output files, and the error traceback. Thanks in advance for your time and attention!

The code is as follows:

new_in_silico_perturbation.py

from geneformer import InSilicoPerturber

from geneformer import InSilicoPerturberStats

from geneformer import EmbExtractor

from datasets import load_dataset, load_from_disk

print('1')
cell_states_to_model={"state_key": "disease",
"start_state": "dcm",
"goal_state": "nf",
"alt_states": ["hcm"]}

print('2')
filter_data_dict={"cell_type":["Cardiomyocyte1","Cardiomyocyte2","Cardiomyocyte3"]}

print('3')
embex = EmbExtractor(model_type="CellClassifier",
num_classes=3,
filter_data=filter_data_dict,
max_ncells=1000,
emb_layer=0,
summary_stat="exact_mean",
forward_batch_size=16, # 256
nproc=16)

print('4')
state_embs_dict = embex.get_state_embs(cell_states_to_model,
"../fine_tuned_models/geneformer-6L-30M_CellClassifier_cardiomyopathies_220224",
"../../Genecorpus-30M/example_input_files/cell_classification/disease_classification/human_dcm_hcm_nf.dataset",
"test_output",
"output_prefix")

print('5')
isp = InSilicoPerturber(perturb_type="delete",
perturb_rank_shift=None,
genes_to_perturb="all",
combos=0,
anchor_gene=None,
model_type="CellClassifier",
num_classes=3,
emb_mode="cell",
cell_emb_style="mean_pool",
filter_data=filter_data_dict,
cell_states_to_model=cell_states_to_model,
state_embs_dict=state_embs_dict,
max_ncells=500, #2000,
emb_layer=0,
forward_batch_size=16, # 400,
nproc=16)

print('6')
isp.perturb_data("../fine_tuned_models/geneformer-6L-30M_CellClassifier_cardiomyopathies_220224",
"../../Genecorpus-30M/example_input_files/cell_classification/disease_classification/human_dcm_hcm_nf.dataset",
"test_output",
"output_prefix")

print('7')
ispstats = InSilicoPerturberStats(mode="goal_state_shift",
genes_perturbed="all",
combos=0,
anchor_gene=None,
cell_states_to_model=cell_states_to_model)

print('8')
ispstats.get_stats("../../Genecorpus-30M/example_input_files/cell_classification/disease_classification/human_dcm_hcm_nf.dataset",
None,
"test_output",
"output_prefix")

The current contests of the test_output directory are:

$ ls test_output/
in_silico_delete_output_prefix_dict_cell_embs_1Kbatch0_raw.pickle   output_prefix.pkl
in_silico_delete_output_prefix_dict_cell_embs_1Kbatch-1_raw.pickle

The error is:

Traceback (most recent call last):             
  File "new_in_silico_perturbation.py", line 61, in <module>
    ispstats = InSilicoPerturberStats(mode="goal_state_shift",
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "in_silico_perturber_stats.py", line 708, in __init__
    self.gene_name_id_dict = pickle.load(f)
                             ^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, 'v'.

Thank you for your interest in Geneformer! This error can sometimes occur when you have not actually downloaded the dictionary but only a pointer to the dictionary. This occurs if you don't have git lfs enabled. Please either enable git lfs before cloning the repository (see model card for instructions) or try downloading the token dictionary directly (e.g. wget the file by its download link) and placing it in your geneformer directory to test whether that resolves the issue.

ctheodoris changed discussion status to closed

Hello,

I am running the in_silico_perturber example as well and I was receiving the same error (Error: _pickle.UnpicklingError: invalid load key, 'v'.). I have ensured that git lfs is enabled and I have also used wget to directly download the token dictionary into my geneformer directory, but I am still receiving the same error. Are there any other suggestions to fix this error? Thank you!

Thank you for your question! I would check loading the file you downloaded with pickle.load - if it is able to be loaded, I would check that you have deleted the prior version and ensure the Geneformer modules are pointing to the right file. If it is not able to be loaded, wget should have worked, but you can always manually press the arrow button for download to the right of the file to download it that way.

Sign up or log in to comment