IDEFICS_Data_Measurement_Tool / log_files /run_data_measurements.log
Ezi's picture
Upload 312 files
46df0b6
2023-08-21 20:05:57,721:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-21 20:05:57,721:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-21 20:05:57,721:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-21 20:05:57,722:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-21 20:05:58,740:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
Couldn't find a dataset script at /Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/OBELICS/OBELICS.py or any data file in the same directory. Couldn't find 'OBELICS' on the Hugging Face Hub either: FileNotFoundError: Couldn't find file at https://raw.githubusercontent.com/huggingface/datasets/master/datasets/OBELICS/OBELICS.py
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__
self.dset = self._get_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 219, in load_truncated_dataset
full_dataset = load_dataset(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/load.py", line 1656, in load_dataset
builder_instance = load_dataset_builder(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/load.py", line 1439, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/load.py", line 1189, in dataset_module_factory
raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at /Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/OBELICS/OBELICS.py or any data file in the same directory. Couldn't find 'OBELICS' on the Hugging Face Hub either: FileNotFoundError: Couldn't find file at https://raw.githubusercontent.com/huggingface/datasets/master/datasets/OBELICS/OBELICS.py
2023-08-21 20:05:58,752:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-21 20:05:58,752:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-21 20:08:11,924:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-21 20:08:11,925:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-21 20:08:11,925:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-21 20:08:11,925:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-21 21:52:44,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-21 21:52:44,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-21 21:52:44,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train[:10%]', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-21 21:52:44,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-21 22:26:00,109:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-21 22:26:00,109:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-21 22:26:00,109:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-21 22:26:49,878:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-21 22:26:49,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-21 22:26:49,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-21 22:27:08,087:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-21 22:27:08,088:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-21 22:27:08,088:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-21 22:27:08,088:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-21 22:43:25,230:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-21 22:43:25,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-21 22:43:25,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='ri', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-21 22:43:25,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-21 22:54:30,712:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-21 22:54:30,712:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-21 22:54:30,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train[:100]', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-21 22:54:30,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-21 22:55:32,445:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-21 22:55:32,446:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-21 22:55:32,446:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train[:10]', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-21 22:55:32,446:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 00:14:53,699:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 00:14:53,699:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 00:14:53,700:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 00:14:53,700:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 00:26:39,298:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 00:26:39,299:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 00:26:39,299:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 00:26:58,461:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 00:26:58,461:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 00:26:58,461:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 00:26:58,461:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 00:27:30,030:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
'text'
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 232, in __init__
self.load_or_prepare_text_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 261, in load_or_prepare_text_dataset
self.prepare_text_dset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 274, in prepare_text_dset
self.text_dset = self.dset.map(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2376, in map
return self._map_single(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 551, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 518, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/fingerprint.py", line 458, in wrapper
out = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2764, in _map_single
batch = apply_function_on_filtered_inputs(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2644, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2336, in decorated
result = f(decorated_item, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 275, in <lambda>
lambda examples: ds_utils.extract_field(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 358, in extract_field
item_list = examples[field_path[0]]
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 132, in __getitem__
values = super().__getitem__(key)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/collections/__init__.py", line 1058, in __getitem__
raise KeyError(key)
KeyError: 'text'
2023-08-22 00:27:30,058:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 00:27:30,059:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 00:31:42,951:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 00:31:42,951:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 00:31:42,951:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 00:32:41,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 00:32:41,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 00:32:41,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 00:32:41,236:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 00:33:14,893:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
'text'
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 232, in __init__
self.load_or_prepare_text_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 261, in load_or_prepare_text_dataset
self.prepare_text_dset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 274, in prepare_text_dset
self.text_dset = self.dset.map(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2376, in map
return self._map_single(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 551, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 518, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/fingerprint.py", line 458, in wrapper
out = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2764, in _map_single
batch = apply_function_on_filtered_inputs(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2644, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2336, in decorated
result = f(decorated_item, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 275, in <lambda>
lambda examples: ds_utils.extract_field(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 358, in extract_field
item_list = examples[field_path[0]]
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 132, in __getitem__
values = super().__getitem__(key)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/collections/__init__.py", line 1058, in __getitem__
raise KeyError(key)
KeyError: 'text'
2023-08-22 00:33:14,916:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 00:33:14,916:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 00:41:09,513:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 00:41:09,514:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 00:41:09,514:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 00:41:09,514:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 00:41:40,048:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
'text'
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 232, in __init__
self.load_or_prepare_text_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 261, in load_or_prepare_text_dataset
self.prepare_text_dset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 274, in prepare_text_dset
self.text_dset = self.dset.map(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2376, in map
return self._map_single(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 551, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 518, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/fingerprint.py", line 458, in wrapper
out = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2764, in _map_single
batch = apply_function_on_filtered_inputs(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2644, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2336, in decorated
result = f(decorated_item, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 275, in <lambda>
lambda examples: ds_utils.extract_field(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 358, in extract_field
item_list = examples[field_path[0]]
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 132, in __getitem__
values = super().__getitem__(key)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/collections/__init__.py", line 1058, in __getitem__
raise KeyError(key)
KeyError: 'text'
2023-08-22 00:41:40,065:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 00:41:40,065:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 01:02:57,529:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 01:02:57,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 01:02:57,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 01:02:57,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 01:11:04,846:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
Object of type bytes is not JSON serializable
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__
self.dset = self._get_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 217, in load_truncated_dataset
_ = f.write(json.dumps(row) + "\n")
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable
2023-08-22 01:11:04,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 01:11:04,880:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 01:39:04,554:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 01:39:04,554:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 01:39:04,554:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 01:39:20,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 01:39:20,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 01:39:20,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 01:39:20,143:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 01:48:32,904:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 01:48:32,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 01:48:32,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 01:48:53,726:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 01:48:53,727:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 01:48:53,727:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 01:48:53,727:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 01:56:55,039:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
Object of type bytes is not JSON serializable
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__
self.dset = self._get_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 216, in load_truncated_dataset
_ = f.write(json.dumps(row) + "\n")
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable
2023-08-22 01:56:55,133:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 01:56:55,134:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 02:15:01,934:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 02:15:01,934:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 02:15:01,934:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 02:15:35,252:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 02:15:35,253:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 02:15:35,253:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 02:15:35,253:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 02:16:08,141:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
type object 'Dataset' has no attribute 'from_generator'
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__
self.dset = self._get_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 214, in load_truncated_dataset
dataset = Dataset.from_generator(gen, features=iterable_dataset.features)
AttributeError: type object 'Dataset' has no attribute 'from_generator'
2023-08-22 02:16:08,158:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 02:16:08,158:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 02:16:54,319:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 02:16:54,319:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 02:16:54,320:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 02:16:54,320:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 02:17:25,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
type object 'Dataset' has no attribute 'from_generator'
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__
self.dset = self._get_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 214, in load_truncated_dataset
dataset = Dataset.from_generator(gen, features=iterable_dataset.features)
AttributeError: type object 'Dataset' has no attribute 'from_generator'
2023-08-22 02:17:25,835:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 02:17:25,835:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 02:22:04,256:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 02:22:04,257:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 02:22:04,257:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 02:22:24,147:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 02:22:24,148:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 02:22:24,148:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 02:22:24,148:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 02:23:00,157:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
type object 'Dataset' has no attribute 'from_generator'
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__
self.dset = self._get_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 214, in load_truncated_dataset
dataset = Dataset.from_generator(gen, features=iterable_dataset.features)
AttributeError: type object 'Dataset' has no attribute 'from_generator'
2023-08-22 02:23:00,176:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 02:23:00,176:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 02:25:16,788:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 02:25:16,789:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 02:25:16,789:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 02:25:16,789:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 02:25:51,681:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
'text'
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args,
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 232, in __init__
self.load_or_prepare_text_dataset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 261, in load_or_prepare_text_dataset
self.prepare_text_dset()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 274, in prepare_text_dset
self.text_dset = self.dset.map(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 592, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 557, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3097, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3474, in _map_single
batch = apply_function_on_filtered_inputs(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3353, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 275, in <lambda>
lambda examples: ds_utils.extract_field(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 361, in extract_field
item_list = examples[field_path[0]]
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/formatting/formatting.py", line 270, in __getitem__
value = self.data[key]
KeyError: 'text'
2023-08-22 02:25:51,704:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 02:25:51,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 02:28:13,954:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 02:28:13,954:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 02:28:13,954:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 02:28:13,955:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 02:28:44,379:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:64
Tokenizing dataset.
2023-08-22 02:28:44,521:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:66
Calculating vocab.
2023-08-22 02:28:44,874:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
'CountVectorizer' object has no attribute 'get_feature_names_out'
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 67, in load_or_prepare
dstats.load_or_prepare_vocab()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 373, in load_or_prepare_vocab
word_count_df = count_vocab_frequencies(self.tokenized_df)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 558, in count_vocab_frequencies
[np.sum(tf, axis=0)], columns=cvec.get_feature_names_out()
AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names_out'
2023-08-22 02:28:44,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 02:28:44,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 02:29:57,574:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 02:29:57,574:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 02:29:57,574:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 02:30:47,498:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 02:30:47,499:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 02:30:47,499:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 02:30:47,499:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 02:31:19,187:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:64
Tokenizing dataset.
2023-08-22 02:31:19,299:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:66
Calculating vocab.
2023-08-22 02:31:19,594:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333
'CountVectorizer' object has no attribute 'get_feature_names_out'
Traceback (most recent call last):
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main
pass_args_to_DMT(
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 67, in load_or_prepare
dstats.load_or_prepare_vocab()
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 373, in load_or_prepare_vocab
word_count_df = count_vocab_frequencies(self.tokenized_df)
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 558, in count_vocab_frequencies
[np.sum(tf, axis=0)], columns=cvec.get_feature_names_out()
AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names_out'
2023-08-22 02:31:19,595:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341
Data measurements not computed. ☹️
2023-08-22 02:31:19,595:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues
2023-08-22 03:40:26,643:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 03:40:26,644:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 03:40:26,644:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 03:41:31,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 03:41:31,906:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 03:41:31,906:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 03:42:37,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 03:42:37,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 03:42:37,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 03:42:37,372:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 03:46:15,435:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:64
Tokenizing dataset.
2023-08-22 03:48:12,829:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:66
Calculating vocab.
2023-08-22 03:50:15,378:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:73
* Calculating general statistics.
2023-08-22 03:50:32,272:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:75
Done!
2023-08-22 03:50:32,272:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:76
Basic text statistics now available at cache_dir/HuggingFaceM4/OBELICS_default_train_texts/general_stats_dict.json.
2023-08-22 03:50:32,272:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:80
* Calculating text duplicates.
2023-08-22 03:50:40,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:83
If all went well, then results are in the following files:
2023-08-22 03:50:40,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:85
statistics: cache_dir/HuggingFaceM4/OBELICS_default_train_texts/text_duplicates/text_duplicates.json
2023-08-22 03:50:40,706:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:85
html: cache_dir/HuggingFaceM4/OBELICS_default_train_texts/text_duplicates/text_duplicates.html
2023-08-22 03:50:40,706:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:88
* Calculating text lengths.
2023-08-22 03:52:44,734:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:97
* Calculating label statistics.
2023-08-22 03:52:44,735:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:99
No label field found.
2023-08-22 03:52:44,735:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:100
No label statistics to calculate.
2023-08-22 05:08:25,998:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 05:08:25,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 05:08:25,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)
2023-08-22 05:08:25,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147
Not using any cache; starting afresh
2023-08-22 05:10:18,037:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:64
Tokenizing dataset.
2023-08-22 05:11:03,365:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:66
Calculating vocab.
2023-08-22 05:11:51,246:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:73
* Calculating general statistics.
2023-08-22 05:12:01,669:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:75
Done!
2023-08-22 05:12:01,669:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:76
Basic text statistics now available at cache_dir/HuggingFaceM4/OBELICS_default_train_texts/general_stats_dict.json.
2023-08-22 05:12:01,669:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:80
* Calculating text duplicates.
2023-08-22 05:12:07,163:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:83
If all went well, then results are in the following files:
2023-08-22 05:12:07,163:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:85
statistics: cache_dir/HuggingFaceM4/OBELICS_default_train_texts/text_duplicates/text_duplicates.json
2023-08-22 05:12:07,163:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:85
html: cache_dir/HuggingFaceM4/OBELICS_default_train_texts/text_duplicates/text_duplicates.html
2023-08-22 05:12:07,163:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:88
* Calculating text lengths.
2023-08-22 05:12:46,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:97
* Calculating label statistics.
2023-08-22 05:12:46,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:99
No label field found.
2023-08-22 05:12:46,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:100
No label statistics to calculate.
2023-08-22 05:25:43,095:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174
Label column name not given. Assuming it's 'label'.
2023-08-22 05:25:43,096:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280
Proceeding with the following arguments:
2023-08-22 05:25:43,096:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True)