TypeError: argument of type PosixPath is not iterable

#339
by romerm16 - opened

I'm using the official example scripts from tokenizer.py,
from a loom file I created with my data.

I'm using the following versions:
Ubuntu: 22.04
Python: 3.10
loompy: 3.0.7
scanpy: 1.9.3
geneformer: 0.1.0

The code is as follows (from the documentation):
from geneformer import TranscriptomeTokenizer
tk = TranscriptomeTokenizer({"cell_type": "cell_type", "organ_major": "organ"}, nproc=4)
tk.tokenize_data("data_directory", "output_directory", "output_prefix")

I get the following error:
TypeError: argument of type 'PosixPath' is not iterable

image.png

I solved it using by typecasting the output_path in line 178 of tokenizer.py, replacing:
output_path = (Path(output_directory) / output_prefix).with_suffix(".dataset")
by:
output_path = str((Path(output_directory) / output_prefix).with_suffix(".dataset"))

Thank you for bringing this up! This is environment-dependent, as we do not encounter this in all environments. We will patch this here, but since this is an issue with Hugging Face Datasets (from which save_to_disk is imported), we would encourage you to open an issue with them so that they can resolve this directly.

It has been patched here, please pull the updated version, but also it would be great to open this issue with Hugging Face Datasets so they can resolve it on their end - thank you!

ctheodoris changed discussion status to closed

Sign up or log in to comment