Submitting job: /common/home/users/d/dh.huang.2023/code/logical-reasoning/scripts/eval-mgtv.sh Current Directory: /common/home/users/d/dh.huang.2023/code/logical-reasoning Thu Jul 18 10:01:05 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA L40 On | 00000000:01:00.0 Off | 0 | | N/A 40C P8 37W / 300W | 1MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ Linux lagoon 4.18.0-553.5.1.el8_10.x86_64 #1 SMP Thu Jun 6 09:41:19 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux NAME="Rocky Linux" VERSION="8.10 (Green Obsidian)" ID="rocky" ID_LIKE="rhel centos fedora" VERSION_ID="8.10" PLATFORM_ID="platform:el8" PRETTY_NAME="Rocky Linux 8.10 (Green Obsidian)" ANSI_COLOR="0;32" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:rocky:rocky:8:GA" HOME_URL="https://rockylinux.org/" BUG_REPORT_URL="https://bugs.rockylinux.org/" SUPPORT_END="2029-05-31" ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8" ROCKY_SUPPORT_PRODUCT_VERSION="8.10" REDHAT_SUPPORT_PRODUCT="Rocky Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8.10" Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 25 Model: 1 Model name: AMD EPYC 7763 64-Core Processor Stepping: 1 CPU MHz: 2450.000 CPU max MHz: 3529.0520 CPU min MHz: 1500.0000 BogoMIPS: 4891.15 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 32768K NUMA node0 CPU(s): 0-127 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm MemTotal: 527669148 kB Eval shenzhi-wang/Llama3-8B-Chinese-Chat with llama-factory/saves/llama3-8b/lora/sft_bf16_p1_full [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:01:18,095 >> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:01:18,096 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:01:18,096 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:01:18,096 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 10:01:18,700 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 10:01:18,962 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 10:01:18,962 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 10:01:19,046 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 10:01:19,054 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 10:01:19,055 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 10:01:38,082 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 10:01:38,316 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 10:01:38,316 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:15:21,774 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:15:21,775 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:15:21,775 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 10:15:22,015 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 10:15:22,252 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 10:15:22,252 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 10:15:22,333 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 10:15:22,334 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 10:15:22,335 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 10:15:32,231 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 10:15:32,472 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 10:15:32,472 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:54:09,023 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:54:09,023 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 10:54:09,023 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 10:54:09,269 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 10:54:09,505 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 10:54:09,506 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 10:54:09,588 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 10:54:09,589 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 10:54:09,589 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 10:54:18,758 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 10:54:18,995 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 10:54:18,995 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 11:54:14,657 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 11:54:14,657 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 11:54:14,657 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 11:54:14,911 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 11:54:15,151 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 11:54:15,152 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 11:54:15,233 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 11:54:15,234 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 11:54:15,235 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 11:54:24,328 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 11:54:24,577 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 11:54:24,577 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 12:54:29,757 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 12:54:29,758 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 12:54:29,758 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 12:54:30,001 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 12:54:30,238 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 12:54:30,239 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 12:54:30,319 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 12:54:30,320 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 12:54:30,320 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 12:54:39,357 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 12:54:39,595 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 12:54:39,595 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 13:54:40,508 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 13:54:40,508 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 13:54:40,508 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 13:54:40,751 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 13:54:40,998 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 13:54:40,998 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 13:54:41,078 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 13:54:41,079 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 13:54:41,080 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 13:54:50,339 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 13:54:50,575 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 13:54:50,576 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 14:04:13,332 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 14:04:13,332 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 14:04:13,332 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 14:04:13,577 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 14:04:13,811 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 14:04:13,812 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 14:04:13,894 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 14:04:13,895 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 14:04:13,896 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 14:04:23,032 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 14:04:23,817 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 14:04:23,817 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 15:17:47,261 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 15:17:47,261 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 15:17:47,261 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 15:17:47,515 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 15:17:47,770 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 15:17:47,770 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 15:17:47,852 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 15:17:47,855 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 15:17:47,856 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 15:18:06,653 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 15:18:06,903 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 15:18:06,904 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 16:31:24,293 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 16:31:24,293 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 16:31:24,293 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 16:31:24,544 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 16:31:24,794 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 16:31:24,795 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 16:31:24,876 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 16:31:24,880 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 16:31:24,880 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 16:31:43,099 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 16:31:43,350 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 16:31:43,351 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 17:44:19,100 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 17:44:19,100 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 17:44:19,100 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 17:44:19,351 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 17:44:19,603 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 17:44:19,603 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 17:44:19,690 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 17:44:19,693 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 17:44:19,694 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 17:44:38,467 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 17:44:38,931 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 17:44:38,931 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 18:57:32,462 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 18:57:32,462 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 18:57:32,462 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 18:57:32,713 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 18:57:33,073 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 18:57:33,074 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 18:57:33,156 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 18:57:33,157 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 18:57:33,158 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 18:57:43,172 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 18:57:43,423 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 18:57:43,423 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00> loading file tokenizer.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 20:10:37,307 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-18 20:10:37,307 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-18 20:10:37,307 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/tokenizer_config.json [WARNING|logging.py:314] 2024-07-18 20:10:37,555 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-18 20:10:37,831 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/config.json [INFO|configuration_utils.py:796] 2024-07-18 20:10:37,832 >> Model config LlamaConfig { "_name_or_path": "shenzhi-wang/Llama3-8B-Chinese-Chat", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3474] 2024-07-18 20:10:37,915 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-18 20:10:37,917 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-18 20:10:37,917 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-18 20:10:47,142 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at shenzhi-wang/Llama3-8B-Chinese-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-18 20:10:47,391 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--shenzhi-wang--Llama3-8B-Chinese-Chat/snapshots/f25f13cb2571e70e285121faceac92926b51e6f5/generation_config.json [INFO|configuration_utils.py:962] 2024-07-18 20:10:47,391 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009, "pad_token_id": 128009 } 0%| | 0/3000 [00:00