06/17/2024 19:50:24 - INFO - transformers.models.auto.tokenization_auto - Could not locate the tokenizer configuration file, will try to use the model config instead. 06/17/2024 19:50:25 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/config.json 06/17/2024 19:50:25 - INFO - transformers.configuration_utils - Model config Qwen2Config { "_name_or_path": "Qwen/Qwen2-1.5B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 1536, "initializer_range": 0.02, "intermediate_size": 8960, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 12, "num_hidden_layers": 28, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } 06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file vocab.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/vocab.json 06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file merges.txt from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/merges.txt 06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file tokenizer.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/tokenizer.json 06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file added_tokens.json from cache at None 06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file special_tokens_map.json from cache at None 06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file tokenizer_config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/tokenizer_config.json 06/17/2024 19:50:31 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/17/2024 19:50:31 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 06/17/2024 19:50:31 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/17/2024 19:50:31 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 06/17/2024 19:50:31 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_zh... 06/17/2024 19:50:37 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en... 06/17/2024 19:50:43 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_zh... 06/17/2024 19:50:47 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en... 06/17/2024 19:50:52 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/config.json 06/17/2024 19:50:52 - INFO - transformers.configuration_utils - Model config Qwen2Config { "_name_or_path": "Qwen/Qwen2-1.5B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 1536, "initializer_range": 0.02, "intermediate_size": 8960, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 12, "num_hidden_layers": 28, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } 06/17/2024 19:50:52 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit. 06/17/2024 19:50:52 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit. 06/17/2024 19:50:59 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/model.safetensors 06/17/2024 19:50:59 - INFO - transformers.modeling_utils - Instantiating Qwen2ForCausalLM model under default dtype torch.float16. 06/17/2024 19:50:59 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 } 06/17/2024 19:51:06 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing Qwen2ForCausalLM. 06/17/2024 19:51:06 - INFO - transformers.modeling_utils - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-1.5B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. 06/17/2024 19:51:06 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/generation_config.json 06/17/2024 19:51:06 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.1, "temperature": 0.7, "top_k": 20, "top_p": 0.8 } 06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 06/17/2024 19:51:07 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/17/2024 19:51:07 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,v_proj,down_proj,gate_proj,k_proj,up_proj,q_proj 06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 06/17/2024 19:51:07 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/17/2024 19:51:07 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.misc - Found linear modules: down_proj,k_proj,gate_proj,up_proj,o_proj,v_proj,q_proj 06/17/2024 19:51:07 - INFO - llamafactory.model.loader - trainable params: 9232384 || all params: 1552946688 || trainable%: 0.5945 06/17/2024 19:51:07 - INFO - llamafactory.model.loader - trainable params: 9232384 || all params: 1552946688 || trainable%: 0.5945 06/17/2024 19:51:07 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. 06/17/2024 19:51:07 - INFO - transformers.trainer - Using auto half precision backend 06/17/2024 19:51:07 - INFO - transformers.trainer - ***** Running training ***** 06/17/2024 19:51:07 - INFO - transformers.trainer - Num examples = 2,000 06/17/2024 19:51:07 - INFO - transformers.trainer - Num Epochs = 3 06/17/2024 19:51:07 - INFO - transformers.trainer - Instantaneous batch size per device = 2 06/17/2024 19:51:07 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 32 06/17/2024 19:51:07 - INFO - transformers.trainer - Gradient Accumulation steps = 8 06/17/2024 19:51:07 - INFO - transformers.trainer - Total optimization steps = 186 06/17/2024 19:51:07 - INFO - transformers.trainer - Number of trainable parameters = 9,232,384 06/17/2024 19:51:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.8099, 'learning_rate': 4.9911e-05, 'epoch': 0.08, 'throughput': 2566.78} 06/17/2024 19:52:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.9580, 'learning_rate': 4.9644e-05, 'epoch': 0.16, 'throughput': 2542.66} 06/17/2024 19:53:18 - INFO - llamafactory.extras.callbacks - {'loss': 0.7150, 'learning_rate': 4.9202e-05, 'epoch': 0.24, 'throughput': 2505.70} 06/17/2024 19:54:02 - INFO - llamafactory.extras.callbacks - {'loss': 0.7585, 'learning_rate': 4.8587e-05, 'epoch': 0.32, 'throughput': 2511.00} 06/17/2024 19:54:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.7342, 'learning_rate': 4.7804e-05, 'epoch': 0.40, 'throughput': 2533.17} 06/17/2024 19:55:16 - INFO - llamafactory.extras.callbacks - {'loss': 0.6904, 'learning_rate': 4.6859e-05, 'epoch': 0.48, 'throughput': 2557.91} 06/17/2024 19:55:54 - INFO - llamafactory.extras.callbacks - {'loss': 0.8254, 'learning_rate': 4.5757e-05, 'epoch': 0.56, 'throughput': 2587.35} 06/17/2024 19:56:34 - INFO - llamafactory.extras.callbacks - {'loss': 0.7551, 'learning_rate': 4.4508e-05, 'epoch': 0.64, 'throughput': 2578.43} 06/17/2024 19:57:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.7747, 'learning_rate': 4.3120e-05, 'epoch': 0.72, 'throughput': 2580.98} 06/17/2024 19:57:54 - INFO - llamafactory.extras.callbacks - {'loss': 0.7027, 'learning_rate': 4.1602e-05, 'epoch': 0.80, 'throughput': 2578.88} 06/17/2024 19:58:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.7581, 'learning_rate': 3.9967e-05, 'epoch': 0.88, 'throughput': 2580.17} 06/17/2024 19:59:13 - INFO - llamafactory.extras.callbacks - {'loss': 0.7221, 'learning_rate': 3.8224e-05, 'epoch': 0.96, 'throughput': 2583.14} 06/17/2024 19:59:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.8214, 'learning_rate': 3.6387e-05, 'epoch': 1.04, 'throughput': 2587.56} 06/17/2024 20:00:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.6304, 'learning_rate': 3.4469e-05, 'epoch': 1.12, 'throughput': 2580.91} 06/17/2024 20:01:12 - INFO - llamafactory.extras.callbacks - {'loss': 0.6434, 'learning_rate': 3.2484e-05, 'epoch': 1.20, 'throughput': 2577.50} 06/17/2024 20:01:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.6796, 'learning_rate': 3.0445e-05, 'epoch': 1.28, 'throughput': 2575.34} 06/17/2024 20:02:35 - INFO - llamafactory.extras.callbacks - {'loss': 0.6651, 'learning_rate': 2.8368e-05, 'epoch': 1.36, 'throughput': 2580.63} 06/17/2024 20:03:17 - INFO - llamafactory.extras.callbacks - {'loss': 0.7844, 'learning_rate': 2.6266e-05, 'epoch': 1.44, 'throughput': 2586.23} 06/17/2024 20:04:01 - INFO - llamafactory.extras.callbacks - {'loss': 0.8139, 'learning_rate': 2.4156e-05, 'epoch': 1.52, 'throughput': 2581.43} 06/17/2024 20:04:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.6717, 'learning_rate': 2.2051e-05, 'epoch': 1.60, 'throughput': 2578.63} 06/17/2024 20:04:43 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100 06/17/2024 20:04:44 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/config.json 06/17/2024 20:04:44 - INFO - transformers.configuration_utils - Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 1536, "initializer_range": 0.02, "intermediate_size": 8960, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 12, "num_hidden_layers": 28, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } 06/17/2024 20:04:44 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/tokenizer_config.json 06/17/2024 20:04:44 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/special_tokens_map.json 06/17/2024 20:05:27 - INFO - llamafactory.extras.callbacks - {'loss': 0.7524, 'learning_rate': 1.9968e-05, 'epoch': 1.68, 'throughput': 2577.77} 06/17/2024 20:06:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.6310, 'learning_rate': 1.7920e-05, 'epoch': 1.76, 'throughput': 2578.49} 06/17/2024 20:06:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.7462, 'learning_rate': 1.5923e-05, 'epoch': 1.84, 'throughput': 2578.25} 06/17/2024 20:07:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.6148, 'learning_rate': 1.3990e-05, 'epoch': 1.92, 'throughput': 2578.95} 06/17/2024 20:08:10 - INFO - llamafactory.extras.callbacks - {'loss': 0.7145, 'learning_rate': 1.2136e-05, 'epoch': 2.00, 'throughput': 2582.83} 06/17/2024 20:08:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.6798, 'learning_rate': 1.0374e-05, 'epoch': 2.08, 'throughput': 2583.35} 06/17/2024 20:09:31 - INFO - llamafactory.extras.callbacks - {'loss': 0.6754, 'learning_rate': 8.7157e-06, 'epoch': 2.16, 'throughput': 2584.30} 06/17/2024 20:10:10 - INFO - llamafactory.extras.callbacks - {'loss': 0.6708, 'learning_rate': 7.1737e-06, 'epoch': 2.24, 'throughput': 2584.52} 06/17/2024 20:10:46 - INFO - llamafactory.extras.callbacks - {'loss': 0.6386, 'learning_rate': 5.7587e-06, 'epoch': 2.32, 'throughput': 2587.95} 06/17/2024 20:11:24 - INFO - llamafactory.extras.callbacks - {'loss': 0.6995, 'learning_rate': 4.4809e-06, 'epoch': 2.40, 'throughput': 2590.06} 06/17/2024 20:12:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.6691, 'learning_rate': 3.3494e-06, 'epoch': 2.48, 'throughput': 2593.15} 06/17/2024 20:12:50 - INFO - llamafactory.extras.callbacks - {'loss': 0.6024, 'learning_rate': 2.3721e-06, 'epoch': 2.56, 'throughput': 2588.49} 06/17/2024 20:13:28 - INFO - llamafactory.extras.callbacks - {'loss': 0.6484, 'learning_rate': 1.5562e-06, 'epoch': 2.64, 'throughput': 2591.26} 06/17/2024 20:14:10 - INFO - llamafactory.extras.callbacks - {'loss': 0.7137, 'learning_rate': 9.0736e-07, 'epoch': 2.72, 'throughput': 2585.69} 06/17/2024 20:14:56 - INFO - llamafactory.extras.callbacks - {'loss': 0.7770, 'learning_rate': 4.3025e-07, 'epoch': 2.80, 'throughput': 2589.29} 06/17/2024 20:15:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.6529, 'learning_rate': 1.2827e-07, 'epoch': 2.88, 'throughput': 2590.50} 06/17/2024 20:16:16 - INFO - llamafactory.extras.callbacks - {'loss': 0.6996, 'learning_rate': 3.5659e-09, 'epoch': 2.96, 'throughput': 2590.41} 06/17/2024 20:16:26 - INFO - transformers.trainer - Training completed. Do not forget to share your model on huggingface.co/models =) 06/17/2024 20:16:26 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05 06/17/2024 20:16:27 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/config.json 06/17/2024 20:16:27 - INFO - transformers.configuration_utils - Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 1536, "initializer_range": 0.02, "intermediate_size": 8960, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 12, "num_hidden_layers": 28, "num_key_value_heads": 2, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } 06/17/2024 20:16:27 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/tokenizer_config.json 06/17/2024 20:16:27 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/special_tokens_map.json 06/17/2024 20:16:27 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot. 06/17/2024 20:16:27 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}