Submitting job: /common/home/users/d/dh.huang.2023/code/logical-reasoning/scripts/tune-mgtv.sh Current Directory: /common/home/users/d/dh.huang.2023/code/logical-reasoning Sat Jul 13 15:40:00 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA L40 On | 00000000:81:00.0 Off | 0 | | N/A 30C P8 25W / 300W | 1MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ Linux lagoon 4.18.0-553.5.1.el8_10.x86_64 #1 SMP Thu Jun 6 09:41:19 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux NAME="Rocky Linux" VERSION="8.10 (Green Obsidian)" ID="rocky" ID_LIKE="rhel centos fedora" VERSION_ID="8.10" PLATFORM_ID="platform:el8" PRETTY_NAME="Rocky Linux 8.10 (Green Obsidian)" ANSI_COLOR="0;32" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:rocky:rocky:8:GA" HOME_URL="https://rockylinux.org/" BUG_REPORT_URL="https://bugs.rockylinux.org/" SUPPORT_END="2029-05-31" ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8" ROCKY_SUPPORT_PRODUCT_VERSION="8.10" REDHAT_SUPPORT_PRODUCT="Rocky Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8.10" Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 25 Model: 1 Model name: AMD EPYC 7763 64-Core Processor Stepping: 1 CPU MHz: 3087.936 CPU max MHz: 3529.0520 CPU min MHz: 1500.0000 BogoMIPS: 4891.15 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 32768K NUMA node0 CPU(s): 0-127 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm MemTotal: 527669148 kB /common/home/users/d/dh.huang.2023/code/logical-reasoning/scripts/tune-mgtv.sh: line 28: cho: command not found Current Directory: /common/home/users/d/dh.huang.2023/code/logical-reasoning/llama-factory config/internlm2_5_7b_lora_sft_bf16_p2_full.yaml: { "model_name_or_path": "internlm/internlm2_5-7b-chat-1m", "stage": "sft", "do_train": true, "finetuning_type": "lora", "lora_target": "all", "loraplus_lr_ratio": 16.0, "upcast_layernorm": true, "dataset": "alpaca_mgtv_p2", "template": "intern2", "cutoff_len": 4096, "max_samples": 25000, "overwrite_cache": true, "preprocessing_num_workers": 16, "output_dir": "saves/internlm2_5_7b/lora/sft_bf16_p2_full", "logging_steps": 10, "save_steps": 88, "plot_loss": true, "overwrite_output_dir": true, "per_device_train_batch_size": 32, "gradient_accumulation_steps": 8, "learning_rate": 0.0001, "num_train_epochs": 6.0, "lr_scheduler_type": "cosine", "warmup_ratio": 0.1, "bf16": true, "ddp_timeout": 180000000, "val_size": 0.1, "per_device_eval_batch_size": 1, "eval_strategy": "steps", "eval_steps": 88, "report_to": "wandb", "run_name": "internlm2_5_7b_p2_l40" } 07/13/2024 15:40:12 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.bfloat16 [INFO|tokenization_utils_base.py:2108] 2024-07-13 15:40:13,065 >> loading file ./tokenizer.model from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/./tokenizer.model [INFO|tokenization_utils_base.py:2108] 2024-07-13 15:40:13,065 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-07-13 15:40:13,065 >> loading file special_tokens_map.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-07-13 15:40:13,065 >> loading file tokenizer_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/tokenizer_config.json [INFO|tokenization_utils_base.py:2108] 2024-07-13 15:40:13,065 >> loading file tokenizer.json from cache at None 07/13/2024 15:40:14 - INFO - llamafactory.data.template - Add <|im_end|> to stop words. 07/13/2024 15:40:14 - INFO - llamafactory.data.loader - Loading dataset alpaca_mgtv_p2.json... Converting format of dataset (num_proc=16): 0%| | 0/25000 [00:00> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/config.json [INFO|configuration_utils.py:733] 2024-07-13 15:40:20,559 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/config.json [INFO|configuration_utils.py:796] 2024-07-13 15:40:20,559 >> Model config InternLM2Config { "_name_or_path": "internlm/internlm2_5-7b-chat-1m", "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "eager", "auto_map": { "AutoConfig": "internlm/internlm2_5-7b-chat-1m--configuration_internlm2.InternLM2Config", "AutoModel": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM" }, "bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 262144, "model_type": "internlm2", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 2, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 2.5, "type": "dynamic" }, "rope_theta": 50000000, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 92544 } [INFO|modeling_utils.py:3474] 2024-07-13 15:40:20,894 >> loading weights file model.safetensors from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-07-13 15:40:20,895 >> Instantiating InternLM2ForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-07-13 15:40:20,896 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 2 } input_ids: [1, 92543, 1008, 364, 60403, 68625, 77794, 62591, 63352, 68309, 69323, 60687, 60364, 60355, 68309, 69776, 68411, 60387, 402, 312, 281, 262, 69102, 60497, 60382, 89428, 63352, 60388, 60353, 63352, 60388, 60382, 69401, 68252, 87114, 70436, 68865, 82168, 60355, 364, 314, 281, 262, 74243, 68290, 63352, 60930, 60353, 63352, 60930, 60357, 63352, 68421, 69059, 60355, 364, 308, 281, 262, 69102, 60497, 68251, 73477, 68574, 74004, 60550, 68287, 89214, 61683, 88840, 73687, 60355, 364, 319, 281, 262, 68390, 68772, 68287, 60353, 74243, 60530, 68420, 74740, 68855, 68544, 72719, 68423, 68538, 60387, 60357, 60359, 68278, 60359, 82568, 60359, 68855, 69077, 60359, 60593, 60408, 69583, 60355, 60684, 68855, 60354, 69844, 68559, 68411, 60387, 364, 393, 285, 262, 61369, 63352, 81953, 63352, 60930, 91085, 70670, 69059, 60353, 68855, 60387, 60357, 68319, 68278, 364, 393, 285, 262, 61369, 63352, 81953, 63352, 60930, 68336, 68376, 68319, 80078, 60876, 61015, 60389, 70670, 69059, 60353, 68855, 60387, 82568, 364, 393, 285, 262, 61369, 69102, 60497, 73912, 79865, 74004, 60550, 68287, 68319, 68287, 70436, 68865, 60353, 68855, 60387, 60593, 60408, 69583, 364, 393, 285, 262, 61369, 69102, 60497, 73912, 68406, 71940, 60362, 63352, 60930, 73687, 60353, 68855, 60387, 68855, 69077, 364, 317, 281, 262, 68855, 60366, 68336, 68535, 68574, 69344, 68347, 60353, 71452, 81256, 68423, 68322, 78818, 60666, 60355, 69192, 60353, 73263, 60581, 60419, 68278, 60420, 81256, 60397, 60419, 60358, 60420, 60355, 402, 60836, 86910, 68374, 69776, 68855, 69102, 60497, 74743, 68287, 60355, 402, 465, 63352, 60388, 334, 465, 262, 60361, 63840, 60396, 78165, 60353, 68935, 79406, 70952, 60387, 69731, 71150, 88982, 82620, 60353, 71150, 61329, 60425, 60649, 68935, 69410, 71150, 60382, 60358, 62273, 60458, 61217, 60353, 71479, 60400, 72593, 69380, 79594, 90209, 60355, 60836, 75326, 71150, 82066, 79202, 68540, 60355, 402, 465, 63352, 60930, 334, 465, 262, 73687, 69607, 60510, 70226, 60372, 62650, 60354, 61044, 61066, 69045, 60355, 71389, 61044, 61066, 89463, 60353, 61002, 60510, 70226, 73027, 70134, 60544, 61422, 60355, 68310, 74907, 60361, 71150, 88982, 82620, 68980, 60355, 69104, 60353, 71062, 61976, 60364, 60353, 70134, 60361, 72325, 60463, 68294, 60612, 70623, 60366, 60877, 60668, 60355, 74726, 60354, 61044, 61066, 68394, 70367, 60447, 69126, 70134, 60353, 69731, 68549, 60530, 69410, 71150, 61882, 60825, 60353, 70395, 70134, 60354, 62296, 60463, 60353, 72069, 86407, 68304, 63024, 60880, 60355, 68597, 68891, 73936, 60362, 69372, 60353, 71093, 72276, 60425, 68252, 82569, 70952, 60355, 402, 465, 69102, 60497, 74743, 68287, 334, 465, 262, 61882, 68279, 60548, 60780, 61076, 364, 92542, 364, 92543, 525, 11353, 364, 68278, 2] inputs: <|im_start|>user 你是一个情景猜谜游戏的主持人。游戏规则如下: 1. 参与者会得到一个谜面,谜面会描述一个简单又难以理解的事件。 2. 主持人知道谜底,谜底是谜面的答案。 3. 参与者可以询问任何封闭式问题来找寻事件的真相。 4. 对于每个问题,主持人将根据实际情况回答以下五个选项之一:是、不是、不重要、回答正确、问法错误。各回答的判断标准如下: - 若谜面和谜底能找到问题的答案,回答:是或者不是 - 若谜面和谜底不能直接或者间接推断出问题的答案,回答:不重要 - 若参与者提问不是一个封闭式问题或者问题难以理解,回答:问法错误 - 若参与者提问基本还原了谜底真相,回答:回答正确 5. 回答中不能添加任何其它信息,也不能省略选项中的任何一个字。例如,不可以把“不是”省略成“不”。 请严格按照这些规则回答参与者提出的问题。 **谜面:** 在甄家村里,有一个古老的传说:每年南瓜丰收的季节,南瓜田里总有一个最大的南瓜会不翼而飞,村民们对此现象困惑不解。请找出南瓜失踪背后的原因。 **谜底:** 真相原来与一位年迈的农夫有关。这位农夫年轻时,曾与一位美丽的姑娘相恋。他们约定在南瓜丰收的季节结婚。然而,命运弄人,姑娘在婚礼前的一场意外中离世。悲伤的农夫为了纪念心爱的姑娘,每年都会将最大的南瓜偷走,放到姑娘的墓前,以此寄托自己的哀思。这一行为延续了多年,成为了乡村里一个神秘的传说。 **参与者提出的问题:** 偷的人信神吗 <|im_end|> <|im_start|>assistant 不是 label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 68278, 2] labels: 不是 Loading checkpoint shards: 0%| | 0/8 [00:00> All model checkpoint weights were used when initializing InternLM2ForCausalLM. [INFO|modeling_utils.py:4288] 2024-07-13 15:40:31,063 >> All the weights of InternLM2ForCausalLM were initialized from the model checkpoint at internlm/internlm2_5-7b-chat-1m. If your task is similar to the task the model of the checkpoint was trained on, you can already use InternLM2ForCausalLM for predictions without further training. [INFO|configuration_utils.py:917] 2024-07-13 15:40:31,312 >> loading configuration file generation_config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/generation_config.json [INFO|configuration_utils.py:962] 2024-07-13 15:40:31,313 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": [ 2, 92542 ], "pad_token_id": 2 } 07/13/2024 15:40:31 - INFO - llamafactory.model.model_utils.checkpointing - Upcasting layernorm weights in float32. 07/13/2024 15:40:31 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 07/13/2024 15:40:31 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation. 07/13/2024 15:40:31 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 07/13/2024 15:40:31 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 07/13/2024 15:40:31 - INFO - llamafactory.model.model_utils.misc - Found linear modules: wqkv,w2,w1,w3,wo 07/13/2024 15:40:31 - INFO - llamafactory.model.loader - trainable params: 18,874,368 || all params: 7,756,582,912 || trainable%: 0.2433 Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [INFO|trainer.py:641] 2024-07-13 15:40:31,769 >> Using auto half precision backend 07/13/2024 15:40:31 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00. [INFO|trainer.py:2078] 2024-07-13 15:40:31,996 >> ***** Running training ***** [INFO|trainer.py:2079] 2024-07-13 15:40:31,996 >> Num examples = 22,500 [INFO|trainer.py:2080] 2024-07-13 15:40:31,996 >> Num Epochs = 6 [INFO|trainer.py:2081] 2024-07-13 15:40:31,996 >> Instantaneous batch size per device = 32 [INFO|trainer.py:2084] 2024-07-13 15:40:31,996 >> Total train batch size (w. parallel, distributed & accumulation) = 256 [INFO|trainer.py:2085] 2024-07-13 15:40:31,996 >> Gradient Accumulation steps = 8 [INFO|trainer.py:2086] 2024-07-13 15:40:31,996 >> Total optimization steps = 528 [INFO|trainer.py:2087] 2024-07-13 15:40:31,999 >> Number of trainable parameters = 18,874,368 [INFO|integration_utils.py:723] 2024-07-13 15:40:32,001 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: Currently logged in as: inflaton-sg (inflaton-ai). Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.17.4 wandb: Run data is saved locally in /common2/dh.huang.2023/code/logical-reasoning/llama-factory/wandb/run-20240713_154033-dpm5rcwx wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run internlm2_5_7b_p2_l40 wandb: ⭐️ View project at https://wandb.ai/inflaton-ai/huggingface wandb: 🚀 View run at https://wandb.ai/inflaton-ai/huggingface/runs/dpm5rcwx 0%| | 0/528 [00:00> ***** Running Evaluation ***** [INFO|trainer.py:3721] 2024-07-13 17:47:40,097 >> Num examples = 2500 [INFO|trainer.py:3724] 2024-07-13 17:47:40,097 >> Batch size = 1 {'loss': 6.943, 'grad_norm': 3.514838457107544, 'learning_rate': 1.8867924528301888e-05, 'epoch': 0.11} {'loss': 0.446, 'grad_norm': 1.0595427751541138, 'learning_rate': 3.7735849056603776e-05, 'epoch': 0.23} {'loss': 0.3515, 'grad_norm': 0.6256385445594788, 'learning_rate': 5.660377358490566e-05, 'epoch': 0.34} {'loss': 0.288, 'grad_norm': 0.633573055267334, 'learning_rate': 7.547169811320755e-05, 'epoch': 0.45} {'loss': 0.2819, 'grad_norm': 0.4915701746940613, 'learning_rate': 9.433962264150944e-05, 'epoch': 0.57} {'loss': 0.2765, 'grad_norm': 0.40083640813827515, 'learning_rate': 9.994642390694308e-05, 'epoch': 0.68} {'loss': 0.2754, 'grad_norm': 0.7176418304443359, 'learning_rate': 9.968428675226714e-05, 'epoch': 0.8} {'loss': 0.277, 'grad_norm': 0.6853049397468567, 'learning_rate': 9.92048928531717e-05, 'epoch': 0.91} 0%| | 0/2500 [00:00> Saving model checkpoint to saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-88 /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-13 17:51:32,799 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/config.json [INFO|configuration_utils.py:796] 2024-07-13 17:51:32,799 >> Model config InternLM2Config { "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "eager", "auto_map": { "AutoConfig": "internlm/internlm2_5-7b-chat-1m--configuration_internlm2.InternLM2Config", "AutoModel": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM" }, "bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 262144, "model_type": "internlm2", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 2, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 2.5, "type": "dynamic" }, "rope_theta": 50000000, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 92544 } [INFO|tokenization_utils_base.py:2513] 2024-07-13 17:51:33,269 >> tokenizer config file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-88/tokenizer_config.json [INFO|tokenization_utils_base.py:2522] 2024-07-13 17:51:33,270 >> Special tokens file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-88/special_tokens_map.json 17%|█▋ | 89/528 [2:12:19<18:53:12, 154.88s/it] 17%|█▋ | 90/528 [2:13:45<16:20:35, 134.33s/it] 17%|█▋ | 90/528 [2:13:45<16:20:35, 134.33s/it] 17%|█▋ | 91/528 [2:15:13<14:36:10, 120.30s/it] 17%|█▋ | 92/528 [2:16:39<13:20:00, 110.09s/it] 18%|█▊ | 93/528 [2:18:07<12:28:30, 103.24s/it] 18%|█▊ | 94/528 [2:19:33<11:51:07, 98.31s/it] 18%|█▊ | 95/528 [2:21:00<11:24:36, 94.87s/it] 18%|█▊ | 96/528 [2:22:28<11:07:18, 92.68s/it] 18%|█▊ | 97/528 [2:23:53<10:50:29, 90.55s/it] 19%|█▊ | 98/528 [2:25:19<10:38:19, 89.07s/it] 19%|█▉ | 99/528 [2:26:45<10:29:46, 88.08s/it] 19%|█▉ | 100/528 [2:28:11<10:24:20, 87.52s/it] 19%|█▉ | 100/528 [2:28:11<10:24:20, 87.52s/it] 19%|█▉ | 101/528 [2:29:37<10:19:06, 86.99s/it] 19%|█▉ | 102/528 [2:31:04<10:17:33, 86.98s/it] 20%|█▉ | 103/528 [2:32:31<10:17:07, 87.12s/it] 20%|█▉ | 104/528 [2:33:57<10:12:22, 86.66s/it] 20%|█▉ | 105/528 [2:35:23<10:10:15, 86.56s/it] 20%|██ | 106/528 [2:36:49<10:06:36, 86.25s/it] 20%|██ | 107/528 [2:38:15<10:05:54, 86.35s/it] 20%|██ | 108/528 [2:39:42<10:04:35, 86.37s/it] 21%|██ | 109/528 [2:41:09<10:06:06, 86.79s/it] 21%|██ | 110/528 [2:42:36<10:03:51, 86.68s/it] 21%|██ | 110/528 [2:42:36<10:03:51, 86.68s/it] 21%|██ | 111/528 [2:44:00<9:56:42, 85.86s/it] 21%|██ | 112/528 [2:45:27<9:58:55, 86.38s/it] 21%|██▏ | 113/528 [2:46:54<9:57:35, 86.40s/it] 22%|██▏ | 114/528 [2:48:20<9:55:59, 86.38s/it] 22%|██▏ | 115/528 [2:49:47<9:55:06, 86.46s/it] 22%|██▏ | 116/528 [2:51:13<9:52:50, 86.34s/it] 22%|██▏ | 117/528 [2:52:39<9:51:30, 86.35s/it] 22%|██▏ | 118/528 [2:54:07<9:53:39, 86.88s/it] 23%|██▎ | 119/528 [2:55:34<9:52:20, 86.90s/it] 23%|██▎ | 120/528 [2:56:59<9:46:29, 86.25s/it] 23%|██▎ | 120/528 [2:56:59<9:46:29, 86.25s/it] 23%|██▎ | 121/528 [2:58:25<9:44:08, 86.11s/it] 23%|██▎ | 122/528 [2:59:51<9:42:30, 86.08s/it] 23%|██▎ | 123/528 [3:01:18<9:43:51, 86.50s/it] 23%|██▎ | 124/528 [3:02:44<9:40:59, 86.29s/it] 24%|██▎ | 125/528 [3:04:09<9:36:43, 85.87s/it] 24%|██▍ | 126/528 [3:05:35<9:35:25, 85.88s/it] 24%|██▍ | 127/528 [3:07:01<9:34:56, 86.03s/it] 24%|██▍ | 128/528 [3:08:28<9:34:29, 86.17s/it] 24%|██▍ | 129/528 [3:09:52<9:29:50, 85.69s/it] 25%|██▍ | 130/528 [3:11:18<9:29:26, 85.85s/it] 25%|██▍ | 130/528 [3:11:18<9:29:26, 85.85s/it] 25%|██▍ | 131/528 [3:12:45<9:29:38, 86.09s/it] 25%|██▌ | 132/528 [3:14:12<9:29:57, 86.36s/it] 25%|██▌ | 133/528 [3:15:38<9:28:04, 86.29s/it] 25%|██▌ | 134/528 [3:17:05<9:26:58, 86.34s/it] 26%|██▌ | 135/528 [3:18:31<9:24:35, 86.20s/it] 26%|██▌ | 136/528 [3:19:55<9:19:07, 85.58s/it] 26%|██▌ | 137/528 [3:21:22<9:20:39, 86.03s/it] 26%|██▌ | 138/528 [3:22:49<9:21:05, 86.32s/it] 26%|██▋ | 139/528 [3:24:15<9:18:54, 86.21s/it] 27%|██▋ | 140/528 [3:25:42<9:19:25, 86.51s/it] 27%|██▋ | 140/528 [3:25:42<9:19:25, 86.51s/it] 27%|██▋ | 141/528 [3:27:07<9:16:06, 86.22s/it] 27%|██▋ | 142/528 [3:28:36<9:18:59, 86.89s/it] 27%|██▋ | 143/528 [3:30:01<9:13:41, 86.29s/it] 27%|██▋ | 144/528 [3:31:26<9:10:53, 86.08s/it] 27%|██▋ | 145/528 [3:32:52<9:08:43, 85.96s/it] 28%|██▊ | 146/528 [3:34:20<9:10:25, 86.45s/it] 28%|██▊ | 147/528 [3:35:45<9:05:58, 85.98s/it] 28%|██▊ | 148/528 [3:37:12<9:08:05, 86.54s/it] 28%|██▊ | 149/528 [3:38:38<9:04:17, 86.17s/it] 28%|██▊ | 150/528 [3:40:03<9:01:55, 86.02s/it] 28%|██▊ | 150/528 [3:40:03<9:01:55, 86.02s/it] 29%|██▊ | 151/528 [3:41:28<8:58:22, 85.68s/it] 29%|██▉ | 152/528 [3:42:53<8:55:37, 85.47s/it] 29%|██▉ | 153/528 [3:44:20<8:56:05, 85.78s/it] 29%|██▉ | 154/528 [3:45:49<9:00:23, 86.69s/it] 29%|██▉ | 155/528 [3:47:16<8:59:47, 86.83s/it] 30%|██▉ | 156/528 [3:48:42<8:57:16, 86.66s/it] 30%|██▉ | 157/528 [3:50:08<8:53:52, 86.34s/it] 30%|██▉ | 158/528 [3:51:33<8:50:51, 86.08s/it] 30%|███ | 159/528 [3:52:59<8:49:30, 86.10s/it] 30%|███ | 160/528 [3:54:27<8:50:16, 86.46s/it] 30%|███ | 160/528 [3:54:27<8:50:16, 86.46s/it] 30%|███ | 161/528 [3:55:52<8:46:19, 86.05s/it] 31%|███ | 162/528 [3:57:18<8:44:56, 86.06s/it] 31%|███ | 163/528 [3:58:44<8:44:15, 86.18s/it] 31%|███ | 164/528 [4:00:11<8:43:27, 86.28s/it] 31%|███▏ | 165/528 [4:01:36<8:39:26, 85.86s/it] 31%|███▏ | 166/528 [4:03:02<8:38:56, 86.01s/it] 32%|███▏ | 167/528 [4:04:29<8:39:33, 86.35s/it] 32%|███▏ | 168/528 [4:05:56<8:39:16, 86.55s/it] 32%|███▏ | 169/528 [4:07:23<8:38:28, 86.65s/it] 32%|███▏ | 170/528 [4:08:50<8:37:48, 86.78s/it] 32%|███▏ | 170/528 [4:08:50<8:37:48, 86.78s/it] 32%|███▏ | 171/528 [4:10:16<8:34:13, 86.43s/it] 33%|███▎ | 172/528 [4:11:42<8:32:37, 86.40s/it] 33%|███▎ | 173/528 [4:13:09<8:32:03, 86.55s/it] 33%|███▎ | 174/528 [4:14:37<8:32:43, 86.90s/it] 33%|███▎ | 175/528 [4:16:04<8:32:17, 87.07s/it] 33%|███▎ | 176/528 [4:17:22<8:13:54, 84.19s/it][INFO|trainer.py:3719] 2024-07-13 19:58:03,230 >> ***** Running Evaluation ***** [INFO|trainer.py:3721] 2024-07-13 19:58:03,230 >> Num examples = 2500 [INFO|trainer.py:3724] 2024-07-13 19:58:03,230 >> Batch size = 1 {'eval_loss': 0.26003387570381165, 'eval_accuracy': 0.9030666666666668, 'eval_runtime': 231.9406, 'eval_samples_per_second': 10.779, 'eval_steps_per_second': 10.779, 'epoch': 1.0} {'loss': 0.2619, 'grad_norm': 0.4268323481082916, 'learning_rate': 9.851033847720166e-05, 'epoch': 1.02} {'loss': 0.2385, 'grad_norm': 0.9503114819526672, 'learning_rate': 9.760366073392246e-05, 'epoch': 1.14} {'loss': 0.2341, 'grad_norm': 0.3606574237346649, 'learning_rate': 9.648882429441257e-05, 'epoch': 1.25} {'loss': 0.2447, 'grad_norm': 0.7226484417915344, 'learning_rate': 9.517070405476575e-05, 'epoch': 1.36} {'loss': 0.2441, 'grad_norm': 0.8543397188186646, 'learning_rate': 9.365506381941066e-05, 'epoch': 1.48} {'loss': 0.2379, 'grad_norm': 0.800394594669342, 'learning_rate': 9.194853109746074e-05, 'epoch': 1.59} {'loss': 0.2434, 'grad_norm': 0.5756838321685791, 'learning_rate': 9.005856812230304e-05, 'epoch': 1.7} {'loss': 0.2352, 'grad_norm': 1.0771032571792603, 'learning_rate': 8.799343922115044e-05, 'epoch': 1.82} {'loss': 0.2427, 'grad_norm': 0.4805872440338135, 'learning_rate': 8.576217467724128e-05, 'epoch': 1.93} 0%| | 0/2500 [00:00> Saving model checkpoint to saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-176 /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-13 20:01:55,394 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/config.json [INFO|configuration_utils.py:796] 2024-07-13 20:01:55,395 >> Model config InternLM2Config { "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "eager", "auto_map": { "AutoConfig": "internlm/internlm2_5-7b-chat-1m--configuration_internlm2.InternLM2Config", "AutoModel": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM" }, "bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 262144, "model_type": "internlm2", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 2, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 2.5, "type": "dynamic" }, "rope_theta": 50000000, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 92544 } [INFO|tokenization_utils_base.py:2513] 2024-07-13 20:01:55,605 >> tokenizer config file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-176/tokenizer_config.json [INFO|tokenization_utils_base.py:2522] 2024-07-13 20:01:55,606 >> Special tokens file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-176/special_tokens_map.json 34%|███▎ | 177/528 [4:22:41<15:04:52, 154.68s/it] 34%|███▎ | 178/528 [4:24:08<13:03:38, 134.34s/it] 34%|███▍ | 179/528 [4:25:34<11:37:08, 119.85s/it] 34%|███▍ | 180/528 [4:26:58<10:33:54, 109.29s/it] 34%|███▍ | 180/528 [4:26:58<10:33:54, 109.29s/it] 34%|███▍ | 181/528 [4:28:25<9:53:04, 102.55s/it] 34%|███▍ | 182/528 [4:29:55<9:29:52, 98.82s/it] 35%|███▍ | 183/528 [4:31:23<9:08:29, 95.39s/it] 35%|███▍ | 184/528 [4:32:49<8:50:52, 92.59s/it] 35%|███▌ | 185/528 [4:34:16<8:39:57, 90.96s/it] 35%|███▌ | 186/528 [4:35:41<8:29:28, 89.38s/it] 35%|███▌ | 187/528 [4:37:09<8:24:55, 88.84s/it] 36%|███▌ | 188/528 [4:38:35<8:17:58, 87.88s/it] 36%|███▌ | 189/528 [4:40:02<8:15:06, 87.63s/it] 36%|███▌ | 190/528 [4:41:30<8:14:16, 87.74s/it] 36%|███▌ | 190/528 [4:41:30<8:14:16, 87.74s/it] 36%|███▌ | 191/528 [4:42:56<8:10:19, 87.30s/it] 36%|███▋ | 192/528 [4:44:22<8:07:24, 87.04s/it] 37%|███▋ | 193/528 [4:45:49<8:05:34, 86.97s/it] 37%|███▋ | 194/528 [4:47:15<8:01:39, 86.53s/it] 37%|███▋ | 195/528 [4:48:41<8:00:00, 86.49s/it] 37%|███▋ | 196/528 [4:50:10<8:02:39, 87.23s/it] 37%|███▋ | 197/528 [4:51:36<7:59:46, 86.97s/it] 38%|███▊ | 198/528 [4:53:03<7:57:42, 86.86s/it] 38%|███▊ | 199/528 [4:54:29<7:54:50, 86.60s/it] 38%|███▊ | 200/528 [4:55:53<7:49:18, 85.85s/it] 38%|███▊ | 200/528 [4:55:53<7:49:18, 85.85s/it] 38%|███▊ | 201/528 [4:57:18<7:46:04, 85.52s/it] 38%|███▊ | 202/528 [4:58:45<7:46:51, 85.93s/it] 38%|███▊ | 203/528 [5:00:11<7:45:55, 86.02s/it] 39%|███▊ | 204/528 [5:01:39<7:48:27, 86.75s/it] 39%|███▉ | 205/528 [5:03:07<7:48:15, 86.98s/it] 39%|███▉ | 206/528 [5:04:33<7:44:57, 86.64s/it] 39%|███▉ | 207/528 [5:05:58<7:41:49, 86.32s/it] 39%|███▉ | 208/528 [5:07:23<7:38:00, 85.88s/it] 40%|███▉ | 209/528 [5:08:49<7:36:57, 85.95s/it] 40%|███▉ | 210/528 [5:10:16<7:36:32, 86.14s/it] 40%|███▉ | 210/528 [5:10:16<7:36:32, 86.14s/it] 40%|███▉ | 211/528 [5:11:42<7:34:20, 85.99s/it] 40%|████ | 212/528 [5:13:07<7:32:37, 85.94s/it] 40%|████ | 213/528 [5:14:36<7:35:12, 86.70s/it] 41%|████ | 214/528 [5:16:01<7:30:41, 86.12s/it] 41%|████ | 215/528 [5:17:26<7:28:17, 85.93s/it] 41%|████ | 216/528 [5:18:53<7:27:43, 86.10s/it] 41%|████ | 217/528 [5:20:21<7:29:50, 86.78s/it] 41%|████▏ | 218/528 [5:21:46<7:25:06, 86.15s/it] 41%|████▏ | 219/528 [5:23:11<7:22:04, 85.84s/it] 42%|████▏ | 220/528 [5:24:35<7:17:35, 85.25s/it] 42%|████▏ | 220/528 [5:24:35<7:17:35, 85.25s/it] 42%|████▏ | 221/528 [5:26:00<7:16:06, 85.23s/it] 42%|████▏ | 222/528 [5:27:25<7:14:02, 85.11s/it] 42%|████▏ | 223/528 [5:28:53<7:17:47, 86.12s/it] 42%|████▏ | 224/528 [5:30:18<7:14:02, 85.67s/it] 43%|████▎ | 225/528 [5:31:44<7:13:53, 85.92s/it] 43%|████▎ | 226/528 [5:33:11<7:12:59, 86.03s/it] 43%|████▎ | 227/528 [5:34:37<7:12:37, 86.24s/it] 43%|████▎ | 228/528 [5:36:05<7:12:49, 86.57s/it] 43%|████▎ | 229/528 [5:37:32<7:13:11, 86.93s/it] 44%|████▎ | 230/528 [5:39:00<7:13:20, 87.25s/it] 44%|████▎ | 230/528 [5:39:00<7:13:20, 87.25s/it] 44%|████▍ | 231/528 [5:40:26<7:10:08, 86.90s/it] 44%|████▍ | 232/528 [5:41:51<7:05:14, 86.20s/it] 44%|████▍ | 233/528 [5:43:17<7:03:46, 86.19s/it] 44%|████▍ | 234/528 [5:44:44<7:03:11, 86.36s/it] 45%|████▍ | 235/528 [5:46:10<7:01:21, 86.29s/it] 45%|████▍ | 236/528 [5:47:38<7:01:32, 86.62s/it] 45%|████▍ | 237/528 [5:49:04<7:00:34, 86.72s/it] 45%|████▌ | 238/528 [5:50:30<6:57:54, 86.46s/it] 45%|████▌ | 239/528 [5:51:57<6:57:04, 86.59s/it] 45%|████▌ | 240/528 [5:53:23<6:54:05, 86.27s/it] 45%|████▌ | 240/528 [5:53:23<6:54:05, 86.27s/it] 46%|████▌ | 241/528 [5:54:49<6:52:43, 86.28s/it] 46%|████▌ | 242/528 [5:56:14<6:49:30, 85.91s/it] 46%|████▌ | 243/528 [5:57:40<6:48:23, 85.98s/it] 46%|████▌ | 244/528 [5:59:06<6:46:57, 85.98s/it] 46%|████▋ | 245/528 [6:00:33<6:46:56, 86.28s/it] 47%|████▋ | 246/528 [6:01:59<6:45:30, 86.28s/it] 47%|████▋ | 247/528 [6:03:25<6:43:22, 86.13s/it] 47%|████▋ | 248/528 [6:04:51<6:41:52, 86.11s/it] 47%|████▋ | 249/528 [6:06:18<6:40:46, 86.19s/it] 47%|████▋ | 250/528 [6:07:44<6:40:07, 86.36s/it] 47%|████▋ | 250/528 [6:07:44<6:40:07, 86.36s/it] 48%|████▊ | 251/528 [6:09:10<6:37:53, 86.19s/it] 48%|████▊ | 252/528 [6:10:35<6:34:39, 85.80s/it] 48%|████▊ | 253/528 [6:12:01<6:33:11, 85.79s/it] 48%|████▊ | 254/528 [6:13:28<6:33:56, 86.26s/it] 48%|████▊ | 255/528 [6:14:55<6:33:36, 86.51s/it] 48%|████▊ | 256/528 [6:16:22<6:31:51, 86.44s/it] 49%|████▊ | 257/528 [6:17:48<6:30:57, 86.56s/it] 49%|████▉ | 258/528 [6:19:14<6:28:14, 86.27s/it] 49%|████▉ | 259/528 [6:20:41<6:27:38, 86.46s/it] 49%|████▉ | 260/528 [6:22:07<6:25:47, 86.37s/it] 49%|████▉ | 260/528 [6:22:07<6:25:47, 86.37s/it] 49%|████▉ | 261/528 [6:23:35<6:25:46, 86.69s/it] 50%|████▉ | 262/528 [6:25:01<6:24:15, 86.67s/it] 50%|████▉ | 263/528 [6:26:29<6:23:50, 86.91s/it] 50%|█████ | 264/528 [6:27:46<6:09:38, 84.01s/it][INFO|trainer.py:3719] 2024-07-13 22:08:27,587 >> ***** Running Evaluation ***** [INFO|trainer.py:3721] 2024-07-13 22:08:27,588 >> Num examples = 2500 [INFO|trainer.py:3724] 2024-07-13 22:08:27,588 >> Batch size = 1 {'eval_loss': 0.25738978385925293, 'eval_accuracy': 0.9006, 'eval_runtime': 231.2131, 'eval_samples_per_second': 10.813, 'eval_steps_per_second': 10.813, 'epoch': 2.0} {'loss': 0.22, 'grad_norm': 0.5219587683677673, 'learning_rate': 8.337453124270863e-05, 'epoch': 2.05} {'loss': 0.1787, 'grad_norm': 0.6363154053688049, 'learning_rate': 8.084094947478556e-05, 'epoch': 2.16} {'loss': 0.1647, 'grad_norm': 0.6807820796966553, 'learning_rate': 7.817250808190483e-05, 'epoch': 2.27} {'loss': 0.1828, 'grad_norm': 0.5443515777587891, 'learning_rate': 7.538087547932585e-05, 'epoch': 2.39} {'loss': 0.1782, 'grad_norm': 0.4641902446746826, 'learning_rate': 7.247825876612353e-05, 'epoch': 2.5} {'loss': 0.1942, 'grad_norm': 0.5865933299064636, 'learning_rate': 6.947735034665002e-05, 'epoch': 2.61} {'loss': 0.1852, 'grad_norm': 0.5332173705101013, 'learning_rate': 6.639127242987988e-05, 'epoch': 2.73} {'loss': 0.1936, 'grad_norm': 0.5550218820571899, 'learning_rate': 6.323351964932908e-05, 'epoch': 2.84} {'loss': 0.1813, 'grad_norm': 0.6850063800811768, 'learning_rate': 6.001790005445607e-05, 'epoch': 2.95} 0%| | 0/2500 [00:00> Saving model checkpoint to saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-264 /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-13 22:12:19,354 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/config.json [INFO|configuration_utils.py:796] 2024-07-13 22:12:19,355 >> Model config InternLM2Config { "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "eager", "auto_map": { "AutoConfig": "internlm/internlm2_5-7b-chat-1m--configuration_internlm2.InternLM2Config", "AutoModel": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM" }, "bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 262144, "model_type": "internlm2", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 2, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 2.5, "type": "dynamic" }, "rope_theta": 50000000, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 92544 } [INFO|tokenization_utils_base.py:2513] 2024-07-13 22:12:19,564 >> tokenizer config file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-264/tokenizer_config.json [INFO|tokenization_utils_base.py:2522] 2024-07-13 22:12:19,565 >> Special tokens file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-264/special_tokens_map.json 50%|█████ | 265/528 [6:33:04<11:16:12, 154.27s/it] 50%|█████ | 266/528 [6:34:30<9:44:11, 133.78s/it] 51%|█████ | 267/528 [6:35:58<8:41:40, 119.93s/it] 51%|█████ | 268/528 [6:37:21<7:52:37, 109.07s/it] 51%|█████ | 269/528 [6:38:47<7:20:00, 101.93s/it] 51%|█████ | 270/528 [6:40:14<6:59:35, 97.58s/it] 51%|█████ | 270/528 [6:40:14<6:59:35, 97.58s/it] 51%|█████▏ | 271/528 [6:41:40<6:43:19, 94.16s/it] 52%|█████▏ | 272/528 [6:43:07<6:31:37, 91.79s/it] 52%|█████▏ | 273/528 [6:44:33<6:22:59, 90.11s/it] 52%|█████▏ | 274/528 [6:46:01<6:18:59, 89.52s/it] 52%|█████▏ | 275/528 [6:47:27<6:13:37, 88.61s/it] 52%|█████▏ | 276/528 [6:48:55<6:10:29, 88.21s/it] 52%|█████▏ | 277/528 [6:50:20<6:05:25, 87.35s/it] 53%|█████▎ | 278/528 [6:51:47<6:03:49, 87.32s/it] 53%|█████▎ | 279/528 [6:53:13<6:00:05, 86.77s/it] 53%|█████▎ | 280/528 [6:54:39<5:58:24, 86.71s/it] 53%|█████▎ | 280/528 [6:54:39<5:58:24, 86.71s/it] 53%|█████▎ | 281/528 [6:56:05<5:55:57, 86.47s/it] 53%|█████▎ | 282/528 [6:57:30<5:52:11, 85.90s/it] 54%|█████▎ | 283/528 [6:58:56<5:51:35, 86.11s/it] 54%|█████▍ | 284/528 [7:00:21<5:48:20, 85.66s/it] 54%|█████▍ | 285/528 [7:01:48<5:48:54, 86.15s/it] 54%|█████▍ | 286/528 [7:03:14<5:46:43, 85.96s/it] 54%|█████▍ | 287/528 [7:04:41<5:47:02, 86.40s/it] 55%|█████▍ | 288/528 [7:06:07<5:45:05, 86.27s/it] 55%|█████▍ | 289/528 [7:07:35<5:45:10, 86.66s/it] 55%|█████▍ | 290/528 [7:09:00<5:42:28, 86.34s/it] 55%|█████▍ | 290/528 [7:09:00<5:42:28, 86.34s/it] 55%|█████▌ | 291/528 [7:10:26<5:40:47, 86.28s/it] 55%|█████▌ | 292/528 [7:11:54<5:40:17, 86.52s/it] 55%|█████▌ | 293/528 [7:13:19<5:37:44, 86.23s/it] 56%|█████▌ | 294/528 [7:14:47<5:37:40, 86.58s/it] 56%|█████▌ | 295/528 [7:16:13<5:36:00, 86.52s/it] 56%|█████▌ | 296/528 [7:17:38<5:33:25, 86.23s/it] 56%|█████▋ | 297/528 [7:19:05<5:32:01, 86.24s/it] 56%|█████▋ | 298/528 [7:20:30<5:29:54, 86.06s/it] 57%|█████▋ | 299/528 [7:21:56<5:28:15, 86.01s/it] 57%|█████▋ | 300/528 [7:23:24<5:29:07, 86.61s/it] 57%|█████▋ | 300/528 [7:23:24<5:29:07, 86.61s/it] 57%|█████▋ | 301/528 [7:24:51<5:28:11, 86.75s/it] 57%|█████▋ | 302/528 [7:26:17<5:25:59, 86.55s/it] 57%|█████▋ | 303/528 [7:27:43<5:23:37, 86.30s/it] 58%|█████▊ | 304/528 [7:29:10<5:22:19, 86.34s/it] 58%|█████▊ | 305/528 [7:30:35<5:19:34, 85.98s/it] 58%|█████▊ | 306/528 [7:32:03<5:20:40, 86.67s/it] 58%|█████▊ | 307/528 [7:33:29<5:18:46, 86.54s/it] 58%|█████▊ | 308/528 [7:34:55<5:15:58, 86.17s/it] 59%|█████▊ | 309/528 [7:36:18<5:11:58, 85.47s/it] 59%|█████▊ | 310/528 [7:37:45<5:11:59, 85.87s/it] 59%|█████▊ | 310/528 [7:37:45<5:11:59, 85.87s/it] 59%|█████▉ | 311/528 [7:39:10<5:09:30, 85.58s/it] 59%|█████▉ | 312/528 [7:40:37<5:09:32, 85.98s/it] 59%|█████▉ | 313/528 [7:42:04<5:09:27, 86.36s/it] 59%|█████▉ | 314/528 [7:43:29<5:06:49, 86.03s/it] 60%|█████▉ | 315/528 [7:44:55<5:04:46, 85.85s/it] 60%|█████▉ | 316/528 [7:46:21<5:03:55, 86.02s/it] 60%|██████ | 317/528 [7:47:49<5:03:57, 86.43s/it] 60%|██████ | 318/528 [7:49:14<5:01:22, 86.11s/it] 60%|██████ | 319/528 [7:50:42<5:01:27, 86.54s/it] 61%|██████ | 320/528 [7:52:08<5:00:11, 86.59s/it] 61%|██████ | 320/528 [7:52:08<5:00:11, 86.59s/it] 61%|██████ | 321/528 [7:53:34<4:57:57, 86.37s/it] 61%|██████ | 322/528 [7:54:59<4:54:45, 85.85s/it] 61%|██████ | 323/528 [7:56:26<4:54:11, 86.11s/it] 61%|██████▏ | 324/528 [7:57:53<4:54:05, 86.50s/it] 62%|██████▏ | 325/528 [7:59:20<4:52:44, 86.52s/it] 62%|██████▏ | 326/528 [8:00:45<4:50:28, 86.28s/it] 62%|██████▏ | 327/528 [8:02:12<4:49:50, 86.52s/it] 62%|██████▏ | 328/528 [8:03:40<4:49:02, 86.71s/it] 62%|██████▏ | 329/528 [8:05:04<4:45:46, 86.16s/it] 62%|██████▎ | 330/528 [8:06:31<4:44:28, 86.20s/it] 62%|██████▎ | 330/528 [8:06:31<4:44:28, 86.20s/it] 63%|██████▎ | 331/528 [8:07:57<4:43:03, 86.21s/it] 63%|██████▎ | 332/528 [8:09:22<4:40:42, 85.93s/it] 63%|██████▎ | 333/528 [8:10:48<4:39:18, 85.94s/it] 63%|██████▎ | 334/528 [8:12:14<4:38:07, 86.02s/it] 63%|██████▎ | 335/528 [8:13:40<4:36:17, 85.89s/it] 64%|██████▎ | 336/528 [8:15:05<4:34:18, 85.72s/it] 64%|██████▍ | 337/528 [8:16:31<4:32:50, 85.71s/it] 64%|██████▍ | 338/528 [8:17:58<4:32:41, 86.11s/it] 64%|██████▍ | 339/528 [8:19:24<4:31:14, 86.11s/it] 64%|██████▍ | 340/528 [8:20:52<4:31:35, 86.68s/it] 64%|██████▍ | 340/528 [8:20:52<4:31:35, 86.68s/it] 65%|██████▍ | 341/528 [8:22:18<4:29:06, 86.34s/it] 65%|██████▍ | 342/528 [8:23:44<4:27:43, 86.36s/it] 65%|██████▍ | 343/528 [8:25:11<4:26:31, 86.44s/it] 65%|██████▌ | 344/528 [8:26:37<4:25:14, 86.49s/it] 65%|██████▌ | 345/528 [8:28:04<4:23:37, 86.43s/it] 66%|██████▌ | 346/528 [8:29:30<4:22:01, 86.38s/it] 66%|██████▌ | 347/528 [8:30:56<4:20:27, 86.34s/it] 66%|██████▌ | 348/528 [8:32:23<4:19:14, 86.41s/it] 66%|██████▌ | 349/528 [8:33:49<4:17:36, 86.35s/it] 66%|██████▋ | 350/528 [8:35:14<4:15:17, 86.05s/it] 66%|██████▋ | 350/528 [8:35:14<4:15:17, 86.05s/it] 66%|██████▋ | 351/528 [8:36:41<4:14:07, 86.14s/it] 67%|██████▋ | 352/528 [8:37:57<4:03:57, 83.16s/it][INFO|trainer.py:3719] 2024-07-14 00:18:38,547 >> ***** Running Evaluation ***** [INFO|trainer.py:3721] 2024-07-14 00:18:38,547 >> Num examples = 2500 [INFO|trainer.py:3724] 2024-07-14 00:18:38,547 >> Batch size = 1 {'eval_loss': 0.2705931067466736, 'eval_accuracy': 0.9027, 'eval_runtime': 231.0113, 'eval_samples_per_second': 10.822, 'eval_steps_per_second': 10.822, 'epoch': 3.0} {'loss': 0.14, 'grad_norm': 0.42733630537986755, 'learning_rate': 5.675847473157485e-05, 'epoch': 3.07} {'loss': 0.1197, 'grad_norm': 0.5972977876663208, 'learning_rate': 5.3469496318302204e-05, 'epoch': 3.18} {'loss': 0.1198, 'grad_norm': 0.4995785653591156, 'learning_rate': 5.016534668039976e-05, 'epoch': 3.3} {'loss': 0.1131, 'grad_norm': 0.5500032305717468, 'learning_rate': 4.6860474023534335e-05, 'epoch': 3.41} {'loss': 0.1185, 'grad_norm': 0.4452584683895111, 'learning_rate': 4.3569329714950704e-05, 'epoch': 3.52} {'loss': 0.1197, 'grad_norm': 0.4754205346107483, 'learning_rate': 4.0306305091319595e-05, 'epoch': 3.64} {'loss': 0.122, 'grad_norm': 0.6347799301147461, 'learning_rate': 3.7085668529084184e-05, 'epoch': 3.75} {'loss': 0.1163, 'grad_norm': 0.48911160230636597, 'learning_rate': 3.392150305248024e-05, 'epoch': 3.86} {'loss': 0.1263, 'grad_norm': 0.6460514068603516, 'learning_rate': 3.082764475205442e-05, 'epoch': 3.98} 0%| | 0/2500 [00:00> Saving model checkpoint to saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-352 /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-14 00:22:32,476 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/config.json [INFO|configuration_utils.py:796] 2024-07-14 00:22:32,478 >> Model config InternLM2Config { "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "eager", "auto_map": { "AutoConfig": "internlm/internlm2_5-7b-chat-1m--configuration_internlm2.InternLM2Config", "AutoModel": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM" }, "bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 262144, "model_type": "internlm2", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 2, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 2.5, "type": "dynamic" }, "rope_theta": 50000000, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 92544 } [INFO|tokenization_utils_base.py:2513] 2024-07-14 00:22:32,704 >> tokenizer config file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-352/tokenizer_config.json [INFO|tokenization_utils_base.py:2522] 2024-07-14 00:22:32,706 >> Special tokens file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-352/special_tokens_map.json 67%|██████▋ | 353/528 [8:43:17<7:30:01, 154.29s/it] 67%|██████▋ | 354/528 [8:44:42<6:27:22, 133.58s/it] 67%|██████▋ | 355/528 [8:46:10<5:45:45, 119.92s/it] 67%|██████▋ | 356/528 [8:47:35<5:13:17, 109.29s/it] 68%|██████▊ | 357/528 [8:49:01<4:51:32, 102.30s/it] 68%|██████▊ | 358/528 [8:50:27<4:35:54, 97.38s/it] 68%|██████▊ | 359/528 [8:51:53<4:24:40, 93.97s/it] 68%|██████▊ | 360/528 [8:53:17<4:15:05, 91.11s/it] 68%|██████▊ | 360/528 [8:53:17<4:15:05, 91.11s/it] 68%|██████▊ | 361/528 [8:54:43<4:09:25, 89.61s/it] 69%|██████▊ | 362/528 [8:56:09<4:04:54, 88.52s/it] 69%|██████▉ | 363/528 [8:57:35<4:01:20, 87.76s/it] 69%|██████▉ | 364/528 [8:59:00<3:57:46, 86.99s/it] 69%|██████▉ | 365/528 [9:00:25<3:54:30, 86.32s/it] 69%|██████▉ | 366/528 [9:01:51<3:52:44, 86.20s/it] 70%|██████▉ | 367/528 [9:03:18<3:51:51, 86.41s/it] 70%|██████▉ | 368/528 [9:04:44<3:50:19, 86.37s/it] 70%|██████▉ | 369/528 [9:06:09<3:47:32, 85.87s/it] 70%|███████ | 370/528 [9:07:34<3:45:38, 85.69s/it] 70%|███████ | 370/528 [9:07:34<3:45:38, 85.69s/it] 70%|███████ | 371/528 [9:09:01<3:44:39, 85.86s/it] 70%|███████ | 372/528 [9:10:26<3:42:52, 85.72s/it] 71%|███████ | 373/528 [9:11:52<3:41:42, 85.82s/it] 71%|███████ | 374/528 [9:13:19<3:41:12, 86.19s/it] 71%|███████ | 375/528 [9:14:45<3:39:39, 86.14s/it] 71%|███████ | 376/528 [9:16:14<3:39:58, 86.83s/it] 71%|███████▏ | 377/528 [9:17:39<3:37:44, 86.52s/it] 72%|███████▏ | 378/528 [9:19:04<3:35:16, 86.11s/it] 72%|███████▏ | 379/528 [9:20:31<3:34:16, 86.28s/it] 72%|███████▏ | 380/528 [9:21:58<3:33:12, 86.44s/it] 72%|███████▏ | 380/528 [9:21:58<3:33:12, 86.44s/it] 72%|███████▏ | 381/528 [9:23:23<3:30:47, 86.04s/it] 72%|███████▏ | 382/528 [9:24:49<3:29:33, 86.12s/it] 73%|███████▎ | 383/528 [9:26:15<3:28:04, 86.10s/it] 73%|███████▎ | 384/528 [9:27:42<3:27:15, 86.36s/it] 73%|███████▎ | 385/528 [9:29:08<3:24:56, 85.99s/it] 73%|███████▎ | 386/528 [9:30:33<3:23:00, 85.78s/it] 73%|███████▎ | 387/528 [9:32:00<3:22:30, 86.18s/it] 73%|███████▎ | 388/528 [9:33:26<3:20:57, 86.13s/it] 74%|███████▎ | 389/528 [9:34:51<3:19:03, 85.93s/it] 74%|███████▍ | 390/528 [9:36:17<3:17:40, 85.95s/it] 74%|███████▍ | 390/528 [9:36:17<3:17:40, 85.95s/it] 74%|███████▍ | 391/528 [9:37:44<3:16:44, 86.16s/it] 74%|███████▍ | 392/528 [9:39:10<3:15:23, 86.20s/it] 74%|███████▍ | 393/528 [9:40:37<3:14:14, 86.33s/it] 75%|███████▍ | 394/528 [9:42:04<3:13:17, 86.55s/it] 75%|███████▍ | 395/528 [9:43:30<3:11:30, 86.39s/it] 75%|███████▌ | 396/528 [9:44:56<3:10:05, 86.41s/it] 75%|███████▌ | 397/528 [9:46:23<3:08:36, 86.38s/it] 75%|███████▌ | 398/528 [9:47:50<3:07:25, 86.51s/it] 76%|███████▌ | 399/528 [9:49:17<3:06:15, 86.63s/it] 76%|███████▌ | 400/528 [9:50:45<3:05:53, 87.13s/it] 76%|███████▌ | 400/528 [9:50:45<3:05:53, 87.13s/it] 76%|███████▌ | 401/528 [9:52:12<3:04:09, 87.00s/it] 76%|███████▌ | 402/528 [9:53:39<3:03:13, 87.25s/it] 76%|███████▋ | 403/528 [9:55:06<3:01:08, 86.95s/it] 77%|███████▋ | 404/528 [9:56:33<2:59:59, 87.09s/it] 77%|███████▋ | 405/528 [9:58:00<2:58:14, 86.95s/it] 77%|███████▋ | 406/528 [9:59:28<2:57:37, 87.36s/it] 77%|███████▋ | 407/528 [10:00:56<2:56:39, 87.60s/it] 77%|███████▋ | 408/528 [10:02:24<2:55:16, 87.64s/it] 77%|███████▋ | 409/528 [10:03:50<2:52:46, 87.11s/it] 78%|███████▊ | 410/528 [10:05:19<2:52:18, 87.62s/it] 78%|███████▊ | 410/528 [10:05:19<2:52:18, 87.62s/it] 78%|███████▊ | 411/528 [10:06:47<2:51:16, 87.83s/it] 78%|███████▊ | 412/528 [10:08:15<2:49:48, 87.83s/it] 78%|███████▊ | 413/528 [10:09:41<2:47:22, 87.32s/it] 78%|███████▊ | 414/528 [10:11:09<2:46:07, 87.43s/it] 79%|███████▊ | 415/528 [10:12:35<2:44:24, 87.30s/it] 79%|███████▉ | 416/528 [10:14:02<2:42:32, 87.08s/it] 79%|███████▉ | 417/528 [10:15:30<2:41:29, 87.29s/it] 79%|███████▉ | 418/528 [10:16:57<2:40:03, 87.31s/it] 79%|███████▉ | 419/528 [10:18:24<2:38:12, 87.09s/it] 80%|███████▉ | 420/528 [10:19:52<2:37:16, 87.38s/it] 80%|███████▉ | 420/528 [10:19:52<2:37:16, 87.38s/it] 80%|███████▉ | 421/528 [10:21:19<2:35:27, 87.17s/it] 80%|███████▉ | 422/528 [10:22:46<2:34:12, 87.29s/it] 80%|████████ | 423/528 [10:24:14<2:33:05, 87.48s/it] 80%|████████ | 424/528 [10:25:42<2:31:54, 87.64s/it] 80%|████████ | 425/528 [10:27:09<2:30:13, 87.51s/it] 81%|████████ | 426/528 [10:28:37<2:28:45, 87.51s/it] 81%|████████ | 427/528 [10:30:04<2:26:56, 87.29s/it] 81%|████████ | 428/528 [10:31:30<2:24:57, 86.98s/it] 81%|████████▏ | 429/528 [10:32:57<2:23:36, 87.04s/it] 81%|████████▏ | 430/528 [10:34:26<2:23:05, 87.61s/it] 81%|████████▏ | 430/528 [10:34:26<2:23:05, 87.61s/it] 82%|████████▏ | 431/528 [10:35:54<2:22:06, 87.91s/it] 82%|████████▏ | 432/528 [10:37:22<2:20:23, 87.75s/it] 82%|████████▏ | 433/528 [10:38:50<2:19:06, 87.85s/it] 82%|████████▏ | 434/528 [10:40:17<2:17:17, 87.63s/it] 82%|████████▏ | 435/528 [10:41:43<2:15:11, 87.22s/it] 83%|████████▎ | 436/528 [10:43:10<2:13:36, 87.14s/it] 83%|████████▎ | 437/528 [10:44:37<2:11:48, 86.90s/it] 83%|████████▎ | 438/528 [10:46:03<2:10:19, 86.89s/it] 83%|████████▎ | 439/528 [10:47:29<2:08:28, 86.61s/it] 83%|████████▎ | 440/528 [10:48:47<2:02:53, 83.79s/it] 83%|████████▎ | 440/528 [10:48:47<2:02:53, 83.79s/it][INFO|trainer.py:3719] 2024-07-14 02:29:28,386 >> ***** Running Evaluation ***** [INFO|trainer.py:3721] 2024-07-14 02:29:28,387 >> Num examples = 2500 [INFO|trainer.py:3724] 2024-07-14 02:29:28,387 >> Batch size = 1 {'eval_loss': 0.2945823669433594, 'eval_accuracy': 0.8993666666666668, 'eval_runtime': 233.0443, 'eval_samples_per_second': 10.728, 'eval_steps_per_second': 10.728, 'epoch': 4.0} {'loss': 0.0856, 'grad_norm': 0.37293320894241333, 'learning_rate': 2.7817622282960815e-05, 'epoch': 4.09} {'loss': 0.0661, 'grad_norm': 0.5676562190055847, 'learning_rate': 2.490459770759398e-05, 'epoch': 4.2} {'loss': 0.061, 'grad_norm': 0.5680781006813049, 'learning_rate': 2.2101308941239203e-05, 'epoch': 4.32} {'loss': 0.0744, 'grad_norm': 0.690169095993042, 'learning_rate': 1.942001405240979e-05, 'epoch': 4.43} {'loss': 0.0736, 'grad_norm': 0.5858839750289917, 'learning_rate': 1.6872437661432517e-05, 'epoch': 4.55} {'loss': 0.0779, 'grad_norm': 0.6473811268806458, 'learning_rate': 1.4469719671666043e-05, 'epoch': 4.66} {'loss': 0.075, 'grad_norm': 0.3694300055503845, 'learning_rate': 1.2222366557537911e-05, 'epoch': 4.77} {'loss': 0.0752, 'grad_norm': 0.5935441851615906, 'learning_rate': 1.0140205422405214e-05, 'epoch': 4.89} {'loss': 0.0684, 'grad_norm': 0.7272607684135437, 'learning_rate': 8.232341027131885e-06, 'epoch': 5.0} 0%| | 0/2500 [00:00> Saving model checkpoint to saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-440 /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-14 02:33:21,617 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/config.json [INFO|configuration_utils.py:796] 2024-07-14 02:33:21,618 >> Model config InternLM2Config { "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "eager", "auto_map": { "AutoConfig": "internlm/internlm2_5-7b-chat-1m--configuration_internlm2.InternLM2Config", "AutoModel": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM" }, "bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 262144, "model_type": "internlm2", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 2, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 2.5, "type": "dynamic" }, "rope_theta": 50000000, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 92544 } [INFO|tokenization_utils_base.py:2513] 2024-07-14 02:33:21,839 >> tokenizer config file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-440/tokenizer_config.json [INFO|tokenization_utils_base.py:2522] 2024-07-14 02:33:21,840 >> Special tokens file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-440/special_tokens_map.json 84%|████████▎ | 441/528 [10:54:06<3:44:05, 154.55s/it] 84%|████████▎ | 442/528 [10:55:35<3:13:16, 134.84s/it] 84%|████████▍ | 443/528 [10:57:02<2:50:46, 120.54s/it] 84%|████████▍ | 444/528 [10:58:30<2:34:55, 110.66s/it] 84%|████████▍ | 445/528 [10:59:57<2:23:18, 103.59s/it] 84%|████████▍ | 446/528 [11:01:24<2:14:43, 98.58s/it] 85%|████████▍ | 447/528 [11:02:51<2:08:34, 95.24s/it] 85%|████████▍ | 448/528 [11:04:19<2:03:55, 92.94s/it] 85%|████████▌ | 449/528 [11:05:45<1:59:46, 90.97s/it] 85%|████████▌ | 450/528 [11:07:14<1:57:21, 90.27s/it] 85%|████████▌ | 450/528 [11:07:14<1:57:21, 90.27s/it] 85%|████████▌ | 451/528 [11:08:41<1:54:35, 89.29s/it] 86%|████████▌ | 452/528 [11:10:09<1:52:34, 88.87s/it] 86%|████████▌ | 453/528 [11:11:37<1:50:50, 88.67s/it] 86%|████████▌ | 454/528 [11:13:05<1:48:59, 88.37s/it] 86%|████████▌ | 455/528 [11:14:31<1:46:34, 87.59s/it] 86%|████████▋ | 456/528 [11:15:59<1:45:27, 87.88s/it] 87%|████████▋ | 457/528 [11:17:26<1:43:34, 87.53s/it] 87%|████████▋ | 458/528 [11:18:53<1:42:08, 87.55s/it] 87%|████████▋ | 459/528 [11:20:21<1:40:51, 87.71s/it] 87%|████████▋ | 460/528 [11:21:48<1:39:09, 87.50s/it] 87%|████████▋ | 460/528 [11:21:48<1:39:09, 87.50s/it] 87%|████████▋ | 461/528 [11:23:16<1:37:49, 87.61s/it] 88%|████████▊ | 462/528 [11:24:43<1:35:56, 87.22s/it] 88%|████████▊ | 463/528 [11:26:11<1:34:55, 87.62s/it] 88%|████████▊ | 464/528 [11:27:39<1:33:30, 87.66s/it] 88%|████████▊ | 465/528 [11:29:06<1:31:51, 87.48s/it] 88%|████████▊ | 466/528 [11:30:34<1:30:26, 87.52s/it] 88%|████████▊ | 467/528 [11:32:01<1:28:50, 87.39s/it] 89%|████████▊ | 468/528 [11:33:28<1:27:18, 87.30s/it] 89%|████████▉ | 469/528 [11:34:54<1:25:25, 86.88s/it] 89%|████████▉ | 470/528 [11:36:22<1:24:18, 87.22s/it] 89%|████████▉ | 470/528 [11:36:22<1:24:18, 87.22s/it] 89%|████████▉ | 471/528 [11:37:48<1:22:42, 87.05s/it] 89%|████████▉ | 472/528 [11:39:17<1:21:40, 87.51s/it] 90%|████████▉ | 473/528 [11:40:45<1:20:25, 87.74s/it] 90%|████████▉ | 474/528 [11:42:13<1:19:03, 87.85s/it] 90%|████████▉ | 475/528 [11:43:40<1:17:18, 87.52s/it] 90%|█████████ | 476/528 [11:45:05<1:15:14, 86.81s/it] 90%|█████████ | 477/528 [11:46:33<1:14:04, 87.14s/it] 91%|█████████ | 478/528 [11:48:01<1:12:41, 87.24s/it] 91%|█████████ | 479/528 [11:49:27<1:11:03, 87.01s/it] 91%|█████████ | 480/528 [11:50:53<1:09:24, 86.77s/it] 91%|█████████ | 480/528 [11:50:53<1:09:24, 86.77s/it] 91%|█████████ | 481/528 [11:52:21<1:08:06, 86.96s/it] 91%|█████████▏| 482/528 [11:53:49<1:06:53, 87.25s/it] 91%|█████████▏| 483/528 [11:55:17<1:05:44, 87.65s/it] 92%|█████████▏| 484/528 [11:56:43<1:03:55, 87.16s/it] 92%|█████████▏| 485/528 [11:58:11<1:02:29, 87.20s/it] 92%|█████████▏| 486/528 [11:59:38<1:01:08, 87.35s/it] 92%|█████████▏| 487/528 [12:01:06<59:40, 87.34s/it] 92%|█████████▏| 488/528 [12:02:32<58:02, 87.07s/it] 93%|█████████▎| 489/528 [12:04:00<56:41, 87.23s/it] 93%|█████████▎| 490/528 [12:05:26<54:59, 86.83s/it] 93%|█████████▎| 490/528 [12:05:26<54:59, 86.83s/it] 93%|█████████▎| 491/528 [12:06:53<53:37, 86.97s/it] 93%|█████████▎| 492/528 [12:08:20<52:10, 86.97s/it] 93%|█████████▎| 493/528 [12:09:46<50:38, 86.82s/it] 94%|█████████▎| 494/528 [12:11:16<49:41, 87.70s/it] 94%|█████████▍| 495/528 [12:12:45<48:22, 87.96s/it] 94%|█████████▍| 496/528 [12:14:12<46:51, 87.85s/it] 94%|█████████▍| 497/528 [12:15:40<45:22, 87.82s/it] 94%|█████████▍| 498/528 [12:17:08<43:54, 87.82s/it] 95%|█████████▍| 499/528 [12:18:34<42:13, 87.36s/it] 95%|█████████▍| 500/528 [12:20:01<40:41, 87.19s/it] 95%|█████████▍| 500/528 [12:20:01<40:41, 87.19s/it] 95%|█████████▍| 501/528 [12:21:28<39:14, 87.21s/it] 95%|█████████▌| 502/528 [12:22:55<37:48, 87.24s/it] 95%|█████████▌| 503/528 [12:24:22<36:15, 87.04s/it] 95%|█████████▌| 504/528 [12:25:50<34:58, 87.42s/it] 96%|█████████▌| 505/528 [12:27:17<33:25, 87.20s/it] 96%|█████████▌| 506/528 [12:28:45<32:01, 87.35s/it] 96%|█████████▌| 507/528 [12:30:10<30:24, 86.88s/it] 96%|█████████▌| 508/528 [12:31:37<28:56, 86.81s/it] 96%|█████████▋| 509/528 [12:33:05<27:37, 87.22s/it] 97%|█████████▋| 510/528 [12:34:32<26:06, 87.05s/it] 97%|█████████▋| 510/528 [12:34:32<26:06, 87.05s/it] 97%|█████████▋| 511/528 [12:36:00<24:45, 87.40s/it] 97%|█████████▋| 512/528 [12:37:27<23:15, 87.22s/it] 97%|█████████▋| 513/528 [12:38:55<21:51, 87.42s/it] 97%|█████████▋| 514/528 [12:40:21<20:17, 86.99s/it] 98%|█████████▊| 515/528 [12:41:48<18:50, 86.92s/it] 98%|█████████▊| 516/528 [12:43:16<17:26, 87.25s/it] 98%|█████████▊| 517/528 [12:44:42<15:57, 87.08s/it] 98%|█████████▊| 518/528 [12:46:11<14:35, 87.52s/it] 98%|█████████▊| 519/528 [12:47:37<13:03, 87.10s/it] 98%|█████████▊| 520/528 [12:49:04<11:35, 86.97s/it] 98%|█████████▊| 520/528 [12:49:04<11:35, 86.97s/it] 99%|█████████▊| 521/528 [12:50:30<10:08, 86.92s/it] 99%|█████████▉| 522/528 [12:51:59<08:44, 87.43s/it] 99%|█████████▉| 523/528 [12:53:27<07:17, 87.49s/it] 99%|█████████▉| 524/528 [12:54:55<05:50, 87.67s/it] 99%|█████████▉| 525/528 [12:56:22<04:22, 87.52s/it] 100%|█████████▉| 526/528 [12:57:49<02:54, 87.39s/it] 100%|█████████▉| 527/528 [12:59:16<01:27, 87.18s/it] 100%|██████████| 528/528 [13:00:34<00:00, 84.44s/it][INFO|trainer.py:3719] 2024-07-14 04:41:15,426 >> ***** Running Evaluation ***** [INFO|trainer.py:3721] 2024-07-14 04:41:15,426 >> Num examples = 2500 [INFO|trainer.py:3724] 2024-07-14 04:41:15,426 >> Batch size = 1 {'eval_loss': 0.369967520236969, 'eval_accuracy': 0.8996666666666668, 'eval_runtime': 232.4252, 'eval_samples_per_second': 10.756, 'eval_steps_per_second': 10.756, 'epoch': 5.0} {'loss': 0.0531, 'grad_norm': 0.3855780363082886, 'learning_rate': 6.5071159772861436e-06, 'epoch': 5.11} {'loss': 0.0437, 'grad_norm': 0.3242223560810089, 'learning_rate': 4.972074243048897e-06, 'epoch': 5.23} {'loss': 0.0463, 'grad_norm': 0.36955130100250244, 'learning_rate': 3.6339281713517303e-06, 'epoch': 5.34} {'loss': 0.0485, 'grad_norm': 0.3851165473461151, 'learning_rate': 2.4985291344915674e-06, 'epoch': 5.45} {'loss': 0.0495, 'grad_norm': 0.30520951747894287, 'learning_rate': 1.5708419435684462e-06, 'epoch': 5.57} {'loss': 0.0484, 'grad_norm': 0.8094011545181274, 'learning_rate': 8.549231386298151e-07, 'epoch': 5.68} {'loss': 0.0384, 'grad_norm': 0.21888971328735352, 'learning_rate': 3.5390325045304706e-07, 'epoch': 5.8} {'loss': 0.0486, 'grad_norm': 0.4017506539821625, 'learning_rate': 6.997311153086883e-08, 'epoch': 5.91} 0%| | 0/2500 [00:00> Saving model checkpoint to saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-528 /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-14 04:45:09,134 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/config.json [INFO|configuration_utils.py:796] 2024-07-14 04:45:09,135 >> Model config InternLM2Config { "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "eager", "auto_map": { "AutoConfig": "internlm/internlm2_5-7b-chat-1m--configuration_internlm2.InternLM2Config", "AutoModel": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM" }, "bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 262144, "model_type": "internlm2", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 2, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 2.5, "type": "dynamic" }, "rope_theta": 50000000, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 92544 } [INFO|tokenization_utils_base.py:2513] 2024-07-14 04:45:09,343 >> tokenizer config file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-528/tokenizer_config.json [INFO|tokenization_utils_base.py:2522] 2024-07-14 04:45:09,345 >> Special tokens file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/checkpoint-528/special_tokens_map.json [INFO|trainer.py:2329] 2024-07-14 04:45:09,798 >> Training completed. Do not forget to share your model on huggingface.co/models =) 100%|██████████| 528/528 [13:04:28<00:00, 84.44s/it] 100%|██████████| 528/528 [13:04:28<00:00, 89.15s/it] [INFO|trainer.py:3410] 2024-07-14 04:45:09,802 >> Saving model checkpoint to saves/internlm2_5_7b/lora/sft_bf16_p2_full /common/home/users/d/dh.huang.2023/.conda/envs/llm-perf-bench/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( [INFO|configuration_utils.py:733] 2024-07-14 04:45:10,284 >> loading configuration file config.json from cache at /common/scratch/users/d/dh.huang.2023/transformers/hub/models--internlm--internlm2_5-7b-chat-1m/snapshots/8d1a709a04d71440ef3df6ebbe204672f411c8b6/config.json [INFO|configuration_utils.py:796] 2024-07-14 04:45:10,285 >> Model config InternLM2Config { "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "eager", "auto_map": { "AutoConfig": "internlm/internlm2_5-7b-chat-1m--configuration_internlm2.InternLM2Config", "AutoModel": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "internlm/internlm2_5-7b-chat-1m--modeling_internlm2.InternLM2ForCausalLM" }, "bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 262144, "model_type": "internlm2", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pad_token_id": 2, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 2.5, "type": "dynamic" }, "rope_theta": 50000000, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 92544 } [INFO|tokenization_utils_base.py:2513] 2024-07-14 04:45:10,484 >> tokenizer config file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/tokenizer_config.json [INFO|tokenization_utils_base.py:2522] 2024-07-14 04:45:10,486 >> Special tokens file saved in saves/internlm2_5_7b/lora/sft_bf16_p2_full/special_tokens_map.json [INFO|trainer.py:3719] 2024-07-14 04:45:10,951 >> ***** Running Evaluation ***** [INFO|trainer.py:3721] 2024-07-14 04:45:10,951 >> Num examples = 2500 [INFO|trainer.py:3724] 2024-07-14 04:45:10,951 >> Batch size = 1 {'eval_loss': 0.4368518590927124, 'eval_accuracy': 0.8984, 'eval_runtime': 232.9443, 'eval_samples_per_second': 10.732, 'eval_steps_per_second': 10.732, 'epoch': 6.0} {'train_runtime': 47077.7992, 'train_samples_per_second': 2.868, 'train_steps_per_second': 0.011, 'train_loss': 0.28717788867652416, 'epoch': 6.0} ***** train metrics ***** epoch = 6.0 total_flos = 2704704304GF train_loss = 0.2872 train_runtime = 13:04:37.79 train_samples_per_second = 2.868 train_steps_per_second = 0.011 Figure saved at: saves/internlm2_5_7b/lora/sft_bf16_p2_full/training_loss.png Figure saved at: saves/internlm2_5_7b/lora/sft_bf16_p2_full/training_eval_loss.png Figure saved at: saves/internlm2_5_7b/lora/sft_bf16_p2_full/training_eval_accuracy.png 0%| | 0/2500 [00:00> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.8984}]} ***** eval metrics ***** epoch = 6.0 eval_accuracy = 0.8984 eval_loss = 0.4369 eval_runtime = 0:03:52.12 eval_samples_per_second = 10.77 eval_steps_per_second = 10.77 wandb: - 0.014 MB of 0.014 MB uploaded wandb: \ 0.062 MB of 0.062 MB uploaded wandb: | 0.062 MB of 0.062 MB uploaded wandb: / 0.062 MB of 0.062 MB uploaded wandb: wandb: Run history: wandb: eval/accuracy █▄▇▂▃▁▁ wandb: eval/loss ▁▁▂▂▅██ wandb: eval/runtime ▄▂▁█▆█▅ wandb: eval/samples_per_second ▅▇█▁▃▁▄ wandb: eval/steps_per_second ▅▇█▁▃▁▄ wandb: train/epoch ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇████ wandb: train/global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇████ wandb: train/grad_norm █▃▂▂▁▂▂▂▁▂▂▂▃▁▂▂▂▂▁▂▁▂▁▁▁▂▂▁▂▂▂▂▁▂▁▁▁▁▂▁ wandb: train/learning_rate ▂▄▅▆██████▇▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▃▃▃▃▂▂▂▂▁▁▁▁▁▁ wandb: train/loss █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: wandb: Run summary: wandb: eval/accuracy 0.8984 wandb: eval/loss 0.43685 wandb: eval/runtime 232.12 wandb: eval/samples_per_second 10.77 wandb: eval/steps_per_second 10.77 wandb: total_flos 2.9041541335076045e+18 wandb: train/epoch 6.0 wandb: train/global_step 528 wandb: train/grad_norm 0.40175 wandb: train/learning_rate 0.0 wandb: train/loss 0.0486 wandb: train_loss 0.28718 wandb: train_runtime 47077.7992 wandb: train_samples_per_second 2.868 wandb: train_steps_per_second 0.011 wandb: wandb: 🚀 View run internlm2_5_7b_p2_l40 at: https://wandb.ai/inflaton-ai/huggingface/runs/dpm5rcwx wandb: ⭐️ View project at: https://wandb.ai/inflaton-ai/huggingface wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20240713_154033-dpm5rcwx/logs wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information. Job ID: 71125 Cluster: crimson User/Group: dh.huang.2023/dh.huang.2023 State: RUNNING Nodes: 1 Cores per node: 10 CPU Utilized: 12:46:29 CPU Efficiency: 9.71% of 5-11:32:10 core-walltime Job Wall-clock time: 13:09:13 Memory Utilized: 2.41 GB Memory Efficiency: 0.94% of 256.00 GB WARNING: Efficiency statistics may be misleading for RUNNING jobs.