Spaces:
Runtime error
Runtime error
2023-08-21 20:05:57,721:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-21 20:05:57,721:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-21 20:05:57,721:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-21 20:05:57,722:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-21 20:05:58,740:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
Couldn't find a dataset script at /Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/OBELICS/OBELICS.py or any data file in the same directory. Couldn't find 'OBELICS' on the Hugging Face Hub either: FileNotFoundError: Couldn't find file at https://raw.githubusercontent.com/huggingface/datasets/master/datasets/OBELICS/OBELICS.py | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__ | |
self.dset = self._get_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset | |
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 219, in load_truncated_dataset | |
full_dataset = load_dataset( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/load.py", line 1656, in load_dataset | |
builder_instance = load_dataset_builder( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/load.py", line 1439, in load_dataset_builder | |
dataset_module = dataset_module_factory( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/load.py", line 1189, in dataset_module_factory | |
raise FileNotFoundError( | |
FileNotFoundError: Couldn't find a dataset script at /Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/OBELICS/OBELICS.py or any data file in the same directory. Couldn't find 'OBELICS' on the Hugging Face Hub either: FileNotFoundError: Couldn't find file at https://raw.githubusercontent.com/huggingface/datasets/master/datasets/OBELICS/OBELICS.py | |
2023-08-21 20:05:58,752:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-21 20:05:58,752:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-21 20:08:11,924:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-21 20:08:11,925:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-21 20:08:11,925:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-21 20:08:11,925:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-21 21:52:44,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-21 21:52:44,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-21 21:52:44,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train[:10%]', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-21 21:52:44,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-21 22:26:00,109:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-21 22:26:00,109:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-21 22:26:00,109:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-21 22:26:49,878:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-21 22:26:49,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-21 22:26:49,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-21 22:27:08,087:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-21 22:27:08,088:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-21 22:27:08,088:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-21 22:27:08,088:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-21 22:43:25,230:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-21 22:43:25,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-21 22:43:25,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='ri', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-21 22:43:25,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-21 22:54:30,712:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-21 22:54:30,712:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-21 22:54:30,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train[:100]', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-21 22:54:30,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-21 22:55:32,445:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-21 22:55:32,446:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-21 22:55:32,446:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train[:10]', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-21 22:55:32,446:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 00:14:53,699:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 00:14:53,699:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 00:14:53,700:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 00:14:53,700:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 00:26:39,298:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 00:26:39,299:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 00:26:39,299:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 00:26:58,461:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 00:26:58,461:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 00:26:58,461:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 00:26:58,461:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 00:27:30,030:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
'text' | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 232, in __init__ | |
self.load_or_prepare_text_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 261, in load_or_prepare_text_dataset | |
self.prepare_text_dset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 274, in prepare_text_dset | |
self.text_dset = self.dset.map( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2376, in map | |
return self._map_single( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 551, in wrapper | |
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 518, in wrapper | |
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/fingerprint.py", line 458, in wrapper | |
out = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2764, in _map_single | |
batch = apply_function_on_filtered_inputs( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2644, in apply_function_on_filtered_inputs | |
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2336, in decorated | |
result = f(decorated_item, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 275, in <lambda> | |
lambda examples: ds_utils.extract_field( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 358, in extract_field | |
item_list = examples[field_path[0]] | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 132, in __getitem__ | |
values = super().__getitem__(key) | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/collections/__init__.py", line 1058, in __getitem__ | |
raise KeyError(key) | |
KeyError: 'text' | |
2023-08-22 00:27:30,058:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 00:27:30,059:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 00:31:42,951:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 00:31:42,951:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 00:31:42,951:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 00:32:41,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 00:32:41,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 00:32:41,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 00:32:41,236:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 00:33:14,893:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
'text' | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 232, in __init__ | |
self.load_or_prepare_text_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 261, in load_or_prepare_text_dataset | |
self.prepare_text_dset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 274, in prepare_text_dset | |
self.text_dset = self.dset.map( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2376, in map | |
return self._map_single( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 551, in wrapper | |
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 518, in wrapper | |
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/fingerprint.py", line 458, in wrapper | |
out = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2764, in _map_single | |
batch = apply_function_on_filtered_inputs( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2644, in apply_function_on_filtered_inputs | |
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2336, in decorated | |
result = f(decorated_item, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 275, in <lambda> | |
lambda examples: ds_utils.extract_field( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 358, in extract_field | |
item_list = examples[field_path[0]] | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 132, in __getitem__ | |
values = super().__getitem__(key) | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/collections/__init__.py", line 1058, in __getitem__ | |
raise KeyError(key) | |
KeyError: 'text' | |
2023-08-22 00:33:14,916:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 00:33:14,916:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 00:41:09,513:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 00:41:09,514:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 00:41:09,514:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 00:41:09,514:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 00:41:40,048:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
'text' | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 232, in __init__ | |
self.load_or_prepare_text_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 261, in load_or_prepare_text_dataset | |
self.prepare_text_dset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 274, in prepare_text_dset | |
self.text_dset = self.dset.map( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2376, in map | |
return self._map_single( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 551, in wrapper | |
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 518, in wrapper | |
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/fingerprint.py", line 458, in wrapper | |
out = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2764, in _map_single | |
batch = apply_function_on_filtered_inputs( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2644, in apply_function_on_filtered_inputs | |
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2336, in decorated | |
result = f(decorated_item, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 275, in <lambda> | |
lambda examples: ds_utils.extract_field( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 358, in extract_field | |
item_list = examples[field_path[0]] | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 132, in __getitem__ | |
values = super().__getitem__(key) | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/collections/__init__.py", line 1058, in __getitem__ | |
raise KeyError(key) | |
KeyError: 'text' | |
2023-08-22 00:41:40,065:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 00:41:40,065:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 01:02:57,529:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 01:02:57,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 01:02:57,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 01:02:57,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 01:11:04,846:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
Object of type bytes is not JSON serializable | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__ | |
self.dset = self._get_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset | |
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 217, in load_truncated_dataset | |
_ = f.write(json.dumps(row) + "\n") | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 231, in dumps | |
return _default_encoder.encode(obj) | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 199, in encode | |
chunks = self.iterencode(o, _one_shot=True) | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 257, in iterencode | |
return _iterencode(o, 0) | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 179, in default | |
raise TypeError(f'Object of type {o.__class__.__name__} ' | |
TypeError: Object of type bytes is not JSON serializable | |
2023-08-22 01:11:04,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 01:11:04,880:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 01:39:04,554:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 01:39:04,554:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 01:39:04,554:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 01:39:20,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 01:39:20,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 01:39:20,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 01:39:20,143:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 01:48:32,904:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 01:48:32,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 01:48:32,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 01:48:53,726:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 01:48:53,727:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 01:48:53,727:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 01:48:53,727:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 01:56:55,039:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
Object of type bytes is not JSON serializable | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__ | |
self.dset = self._get_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset | |
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 216, in load_truncated_dataset | |
_ = f.write(json.dumps(row) + "\n") | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 231, in dumps | |
return _default_encoder.encode(obj) | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 199, in encode | |
chunks = self.iterencode(o, _one_shot=True) | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 257, in iterencode | |
return _iterencode(o, 0) | |
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/encoder.py", line 179, in default | |
raise TypeError(f'Object of type {o.__class__.__name__} ' | |
TypeError: Object of type bytes is not JSON serializable | |
2023-08-22 01:56:55,133:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 01:56:55,134:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/m4-bias-eval-fair-face', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 02:15:01,934:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 02:15:01,934:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 02:15:01,934:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 02:15:35,252:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 02:15:35,253:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 02:15:35,253:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 02:15:35,253:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 02:16:08,141:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
type object 'Dataset' has no attribute 'from_generator' | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__ | |
self.dset = self._get_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset | |
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 214, in load_truncated_dataset | |
dataset = Dataset.from_generator(gen, features=iterable_dataset.features) | |
AttributeError: type object 'Dataset' has no attribute 'from_generator' | |
2023-08-22 02:16:08,158:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 02:16:08,158:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 02:16:54,319:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 02:16:54,319:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 02:16:54,320:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 02:16:54,320:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 02:17:25,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
type object 'Dataset' has no attribute 'from_generator' | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__ | |
self.dset = self._get_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset | |
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 214, in load_truncated_dataset | |
dataset = Dataset.from_generator(gen, features=iterable_dataset.features) | |
AttributeError: type object 'Dataset' has no attribute 'from_generator' | |
2023-08-22 02:17:25,835:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 02:17:25,835:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 02:22:04,256:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 02:22:04,257:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 02:22:04,257:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 02:22:24,147:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 02:22:24,148:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 02:22:24,148:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 02:22:24,148:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 02:23:00,157:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
type object 'Dataset' has no attribute 'from_generator' | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 229, in __init__ | |
self.dset = self._get_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 240, in _get_dataset | |
dset = ds_utils.load_truncated_dataset(self.dset_name, self.dset_config, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 214, in load_truncated_dataset | |
dataset = Dataset.from_generator(gen, features=iterable_dataset.features) | |
AttributeError: type object 'Dataset' has no attribute 'from_generator' | |
2023-08-22 02:23:00,176:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 02:23:00,176:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 02:25:16,788:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 02:25:16,789:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 02:25:16,789:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 02:25:16,789:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 02:25:51,681:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
'text' | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 62, in load_or_prepare | |
dstats = dataset_statistics.DatasetStatisticsCacheClass(**dataset_args, | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 232, in __init__ | |
self.load_or_prepare_text_dataset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 261, in load_or_prepare_text_dataset | |
self.prepare_text_dset() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 274, in prepare_text_dset | |
self.text_dset = self.dset.map( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 592, in wrapper | |
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 557, in wrapper | |
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3097, in map | |
for rank, done, content in Dataset._map_single(**dataset_kwargs): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3474, in _map_single | |
batch = apply_function_on_filtered_inputs( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3353, in apply_function_on_filtered_inputs | |
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 275, in <lambda> | |
lambda examples: ds_utils.extract_field( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/utils/dataset_utils.py", line 361, in extract_field | |
item_list = examples[field_path[0]] | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/venv/lib/python3.9/site-packages/datasets/formatting/formatting.py", line 270, in __getitem__ | |
value = self.data[key] | |
KeyError: 'text' | |
2023-08-22 02:25:51,704:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 02:25:51,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['text'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 02:28:13,954:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 02:28:13,954:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 02:28:13,954:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 02:28:13,955:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 02:28:44,379:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:64 | |
Tokenizing dataset. | |
2023-08-22 02:28:44,521:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:66 | |
Calculating vocab. | |
2023-08-22 02:28:44,874:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
'CountVectorizer' object has no attribute 'get_feature_names_out' | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 67, in load_or_prepare | |
dstats.load_or_prepare_vocab() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 373, in load_or_prepare_vocab | |
word_count_df = count_vocab_frequencies(self.tokenized_df) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 558, in count_vocab_frequencies | |
[np.sum(tf, axis=0)], columns=cvec.get_feature_names_out() | |
AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names_out' | |
2023-08-22 02:28:44,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 02:28:44,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 02:29:57,574:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 02:29:57,574:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 02:29:57,574:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 02:30:47,498:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 02:30:47,499:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 02:30:47,499:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 02:30:47,499:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 02:31:19,187:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:64 | |
Tokenizing dataset. | |
2023-08-22 02:31:19,299:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:66 | |
Calculating vocab. | |
2023-08-22 02:31:19,594:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:333 | |
'CountVectorizer' object has no attribute 'get_feature_names_out' | |
Traceback (most recent call last): | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 309, in main | |
pass_args_to_DMT( | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 160, in pass_args_to_DMT | |
load_or_prepare(dataset_args, calculation=calculation, use_cache=use_cache) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py", line 67, in load_or_prepare | |
dstats.load_or_prepare_vocab() | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 373, in load_or_prepare_vocab | |
word_count_df = count_vocab_frequencies(self.tokenized_df) | |
File "/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py", line 558, in count_vocab_frequencies | |
[np.sum(tf, axis=0)], columns=cvec.get_feature_names_out() | |
AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names_out' | |
2023-08-22 02:31:19,595:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:341 | |
Data measurements not computed. ☹️ | |
2023-08-22 02:31:19,595:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:342 | |
An error occurred in computing data measurements for dataset with arguments: Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True). Feel free to make an issue here: https://github.com/huggingface/data-measurements-tool/issues | |
2023-08-22 03:40:26,643:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 03:40:26,644:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 03:40:26,644:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 03:41:31,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 03:41:31,906:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 03:41:31,906:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 03:42:37,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 03:42:37,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 03:42:37,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 03:42:37,372:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 03:46:15,435:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:64 | |
Tokenizing dataset. | |
2023-08-22 03:48:12,829:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:66 | |
Calculating vocab. | |
2023-08-22 03:50:15,378:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:73 | |
* Calculating general statistics. | |
2023-08-22 03:50:32,272:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:75 | |
Done! | |
2023-08-22 03:50:32,272:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:76 | |
Basic text statistics now available at cache_dir/HuggingFaceM4/OBELICS_default_train_texts/general_stats_dict.json. | |
2023-08-22 03:50:32,272:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:80 | |
* Calculating text duplicates. | |
2023-08-22 03:50:40,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:83 | |
If all went well, then results are in the following files: | |
2023-08-22 03:50:40,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:85 | |
statistics: cache_dir/HuggingFaceM4/OBELICS_default_train_texts/text_duplicates/text_duplicates.json | |
2023-08-22 03:50:40,706:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:85 | |
html: cache_dir/HuggingFaceM4/OBELICS_default_train_texts/text_duplicates/text_duplicates.html | |
2023-08-22 03:50:40,706:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:88 | |
* Calculating text lengths. | |
2023-08-22 03:52:44,734:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:97 | |
* Calculating label statistics. | |
2023-08-22 03:52:44,735:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:99 | |
No label field found. | |
2023-08-22 03:52:44,735:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:100 | |
No label statistics to calculate. | |
2023-08-22 05:08:25,998:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 05:08:25,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 05:08:25,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |
2023-08-22 05:08:25,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:147 | |
Not using any cache; starting afresh | |
2023-08-22 05:10:18,037:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:64 | |
Tokenizing dataset. | |
2023-08-22 05:11:03,365:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:66 | |
Calculating vocab. | |
2023-08-22 05:11:51,246:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:73 | |
* Calculating general statistics. | |
2023-08-22 05:12:01,669:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:75 | |
Done! | |
2023-08-22 05:12:01,669:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:76 | |
Basic text statistics now available at cache_dir/HuggingFaceM4/OBELICS_default_train_texts/general_stats_dict.json. | |
2023-08-22 05:12:01,669:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:80 | |
* Calculating text duplicates. | |
2023-08-22 05:12:07,163:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:83 | |
If all went well, then results are in the following files: | |
2023-08-22 05:12:07,163:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:85 | |
statistics: cache_dir/HuggingFaceM4/OBELICS_default_train_texts/text_duplicates/text_duplicates.json | |
2023-08-22 05:12:07,163:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:85 | |
html: cache_dir/HuggingFaceM4/OBELICS_default_train_texts/text_duplicates/text_duplicates.html | |
2023-08-22 05:12:07,163:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:88 | |
* Calculating text lengths. | |
2023-08-22 05:12:46,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:97 | |
* Calculating label statistics. | |
2023-08-22 05:12:46,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:99 | |
No label field found. | |
2023-08-22 05:12:46,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:100 | |
No label statistics to calculate. | |
2023-08-22 05:25:43,095:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:174 | |
Label column name not given. Assuming it's 'label'. | |
2023-08-22 05:25:43,096:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:280 | |
Proceeding with the following arguments: | |
2023-08-22 05:25:43,096:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/run_data_measurements.py, run_data_measurements:281 | |
Namespace(dataset='HuggingFaceM4/OBELICS', config='default', split='train', feature=['texts'], calculation=None, label_field='label', label_names=[], use_cache=False, out_dir='cache_dir', overwrite_previous=False, email=None, push_cache_to_hub=False, prepare_GUI_data=False, keep_local=True) | |