Passed argument batch_size = auto:4.0. Detecting largest batch size Determined largest batch size: 64 Passed argument batch_size = auto:4.0. Detecting largest batch size Determined largest batch size: 64 hf (pretrained=EleutherAI/pythia-160m,revision=step100000,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto:4 (64,64,64,64,64) | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |---------|------:|------|-----:|--------|---|-----:|---|-----:| |hellaswag| 1|none | 0|acc |↑ |0.2872|± |0.0045| | | |none | 0|acc_norm|↑ |0.3082|± |0.0046| 2024-10-17 23:20:11.330614: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-17 23:20:11.352779: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-17 23:20:11.358651: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-17 23:20:11.372585: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-10-17 23:20:12.527182: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-10-17:23:20:16,572 INFO [__main__.py:279] Verbosity set to INFO 2024-10-17:23:20:27,362 INFO [__main__.py:376] Selected Tasks: ['hellaswag'] 2024-10-17:23:20:27,366 INFO [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 2024-10-17:23:20:27,366 INFO [evaluator.py:201] Initializing hf model, with arguments: {'pretrained': 'EleutherAI/pythia-160m', 'revision': 'step100000', 'dtype': 'float'} 2024-10-17:23:20:27,450 INFO [huggingface.py:129] Using device 'cuda' 2024-10-17:23:20:27,706 INFO [huggingface.py:481] Using model type 'default' /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( 2024-10-17:23:20:29,024 INFO [huggingface.py:365] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'} Downloading data: 100%|██████████| 47.5M/47.5M [00:00<00:00, 169MB/s] Downloading data: 100%|██████████| 11.8M/11.8M [00:00<00:00, 143MB/s] Downloading data: 100%|██████████| 12.2M/12.2M [00:00<00:00, 173MB/s] Generating train split: 100%|██████████| 39905/39905 [00:04<00:00, 9713.15 examples/s] Generating test split: 100%|██████████| 10003/10003 [00:00<00:00, 10216.87 examples/s] Generating validation split: 100%|██████████| 10042/10042 [00:01<00:00, 9867.41 examples/s] Map: 100%|██████████| 39905/39905 [00:07<00:00, 5426.38 examples/s] Map: 100%|██████████| 10042/10042 [00:01<00:00, 5869.01 examples/s] 2024-10-17:23:21:05,908 WARNING [model.py:422] model.chat_template was called with the chat_template set to False or None. Therefore no chat template will be applied. Make sure this is an intended behavior. 2024-10-17:23:21:05,909 INFO [task.py:415] Building contexts for hellaswag on rank 0... 100%|██████████| 10042/10042 [00:05<00:00, 1945.63it/s] 2024-10-17:23:21:12,103 INFO [evaluator.py:489] Running loglikelihood requests Running loglikelihood requests: 100%|██████████| 40168/40168 [04:01<00:00, 166.51it/s] fatal: not a git repository (or any of the parent directories): .git 2024-10-17:23:25:41,634 INFO [evaluation_tracker.py:206] Saving results aggregated