06/17/2024 19:50:24 - INFO - transformers.models.auto.tokenization_auto - Could not locate the tokenizer configuration file, will try to use the model config instead.

06/17/2024 19:50:25 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/config.json

06/17/2024 19:50:25 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2-1.5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}


06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file vocab.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/vocab.json

06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file merges.txt from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/merges.txt

06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file tokenizer.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/tokenizer.json

06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file added_tokens.json from cache at None

06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file special_tokens_map.json from cache at None

06/17/2024 19:50:30 - INFO - transformers.tokenization_utils_base - loading file tokenizer_config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/tokenizer_config.json

06/17/2024 19:50:31 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

06/17/2024 19:50:31 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

06/17/2024 19:50:31 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

06/17/2024 19:50:31 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

06/17/2024 19:50:31 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_zh...

06/17/2024 19:50:37 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en...

06/17/2024 19:50:43 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_zh...

06/17/2024 19:50:47 - INFO - llamafactory.data.loader - Loading dataset llamafactory/glaive_toolcall_en...

06/17/2024 19:50:52 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/config.json

06/17/2024 19:50:52 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2-1.5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}


06/17/2024 19:50:52 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit.

06/17/2024 19:50:52 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit.

06/17/2024 19:50:59 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/model.safetensors

06/17/2024 19:50:59 - INFO - transformers.modeling_utils - Instantiating Qwen2ForCausalLM model under default dtype torch.float16.

06/17/2024 19:50:59 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645
}


06/17/2024 19:51:06 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing Qwen2ForCausalLM.


06/17/2024 19:51:06 - INFO - transformers.modeling_utils - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-1.5B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

06/17/2024 19:51:06 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/generation_config.json

06/17/2024 19:51:06 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.1,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}


06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

06/17/2024 19:51:07 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

06/17/2024 19:51:07 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,v_proj,down_proj,gate_proj,k_proj,up_proj,q_proj

06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

06/17/2024 19:51:07 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

06/17/2024 19:51:07 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

06/17/2024 19:51:07 - INFO - llamafactory.model.model_utils.misc - Found linear modules: down_proj,k_proj,gate_proj,up_proj,o_proj,v_proj,q_proj

06/17/2024 19:51:07 - INFO - llamafactory.model.loader - trainable params: 9232384 || all params: 1552946688 || trainable%: 0.5945

06/17/2024 19:51:07 - INFO - llamafactory.model.loader - trainable params: 9232384 || all params: 1552946688 || trainable%: 0.5945

06/17/2024 19:51:07 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

06/17/2024 19:51:07 - INFO - transformers.trainer - Using auto half precision backend

06/17/2024 19:51:07 - INFO - transformers.trainer - ***** Running training *****

06/17/2024 19:51:07 - INFO - transformers.trainer -   Num examples = 2,000

06/17/2024 19:51:07 - INFO - transformers.trainer -   Num Epochs = 3

06/17/2024 19:51:07 - INFO - transformers.trainer -   Instantaneous batch size per device = 2

06/17/2024 19:51:07 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 32

06/17/2024 19:51:07 - INFO - transformers.trainer -   Gradient Accumulation steps = 8

06/17/2024 19:51:07 - INFO - transformers.trainer -   Total optimization steps = 186

06/17/2024 19:51:07 - INFO - transformers.trainer -   Number of trainable parameters = 9,232,384

06/17/2024 19:51:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.8099, 'learning_rate': 4.9911e-05, 'epoch': 0.08, 'throughput': 2566.78}

06/17/2024 19:52:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.9580, 'learning_rate': 4.9644e-05, 'epoch': 0.16, 'throughput': 2542.66}

06/17/2024 19:53:18 - INFO - llamafactory.extras.callbacks - {'loss': 0.7150, 'learning_rate': 4.9202e-05, 'epoch': 0.24, 'throughput': 2505.70}

06/17/2024 19:54:02 - INFO - llamafactory.extras.callbacks - {'loss': 0.7585, 'learning_rate': 4.8587e-05, 'epoch': 0.32, 'throughput': 2511.00}

06/17/2024 19:54:37 - INFO - llamafactory.extras.callbacks - {'loss': 0.7342, 'learning_rate': 4.7804e-05, 'epoch': 0.40, 'throughput': 2533.17}

06/17/2024 19:55:16 - INFO - llamafactory.extras.callbacks - {'loss': 0.6904, 'learning_rate': 4.6859e-05, 'epoch': 0.48, 'throughput': 2557.91}

06/17/2024 19:55:54 - INFO - llamafactory.extras.callbacks - {'loss': 0.8254, 'learning_rate': 4.5757e-05, 'epoch': 0.56, 'throughput': 2587.35}

06/17/2024 19:56:34 - INFO - llamafactory.extras.callbacks - {'loss': 0.7551, 'learning_rate': 4.4508e-05, 'epoch': 0.64, 'throughput': 2578.43}

06/17/2024 19:57:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.7747, 'learning_rate': 4.3120e-05, 'epoch': 0.72, 'throughput': 2580.98}

06/17/2024 19:57:54 - INFO - llamafactory.extras.callbacks - {'loss': 0.7027, 'learning_rate': 4.1602e-05, 'epoch': 0.80, 'throughput': 2578.88}

06/17/2024 19:58:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.7581, 'learning_rate': 3.9967e-05, 'epoch': 0.88, 'throughput': 2580.17}

06/17/2024 19:59:13 - INFO - llamafactory.extras.callbacks - {'loss': 0.7221, 'learning_rate': 3.8224e-05, 'epoch': 0.96, 'throughput': 2583.14}

06/17/2024 19:59:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.8214, 'learning_rate': 3.6387e-05, 'epoch': 1.04, 'throughput': 2587.56}

06/17/2024 20:00:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.6304, 'learning_rate': 3.4469e-05, 'epoch': 1.12, 'throughput': 2580.91}

06/17/2024 20:01:12 - INFO - llamafactory.extras.callbacks - {'loss': 0.6434, 'learning_rate': 3.2484e-05, 'epoch': 1.20, 'throughput': 2577.50}

06/17/2024 20:01:55 - INFO - llamafactory.extras.callbacks - {'loss': 0.6796, 'learning_rate': 3.0445e-05, 'epoch': 1.28, 'throughput': 2575.34}

06/17/2024 20:02:35 - INFO - llamafactory.extras.callbacks - {'loss': 0.6651, 'learning_rate': 2.8368e-05, 'epoch': 1.36, 'throughput': 2580.63}

06/17/2024 20:03:17 - INFO - llamafactory.extras.callbacks - {'loss': 0.7844, 'learning_rate': 2.6266e-05, 'epoch': 1.44, 'throughput': 2586.23}

06/17/2024 20:04:01 - INFO - llamafactory.extras.callbacks - {'loss': 0.8139, 'learning_rate': 2.4156e-05, 'epoch': 1.52, 'throughput': 2581.43}

06/17/2024 20:04:43 - INFO - llamafactory.extras.callbacks - {'loss': 0.6717, 'learning_rate': 2.2051e-05, 'epoch': 1.60, 'throughput': 2578.63}

06/17/2024 20:04:43 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100

06/17/2024 20:04:44 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/config.json

06/17/2024 20:04:44 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}


06/17/2024 20:04:44 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/tokenizer_config.json

06/17/2024 20:04:44 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/checkpoint-100/special_tokens_map.json

06/17/2024 20:05:27 - INFO - llamafactory.extras.callbacks - {'loss': 0.7524, 'learning_rate': 1.9968e-05, 'epoch': 1.68, 'throughput': 2577.77}

06/17/2024 20:06:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.6310, 'learning_rate': 1.7920e-05, 'epoch': 1.76, 'throughput': 2578.49}

06/17/2024 20:06:48 - INFO - llamafactory.extras.callbacks - {'loss': 0.7462, 'learning_rate': 1.5923e-05, 'epoch': 1.84, 'throughput': 2578.25}

06/17/2024 20:07:32 - INFO - llamafactory.extras.callbacks - {'loss': 0.6148, 'learning_rate': 1.3990e-05, 'epoch': 1.92, 'throughput': 2578.95}

06/17/2024 20:08:10 - INFO - llamafactory.extras.callbacks - {'loss': 0.7145, 'learning_rate': 1.2136e-05, 'epoch': 2.00, 'throughput': 2582.83}

06/17/2024 20:08:52 - INFO - llamafactory.extras.callbacks - {'loss': 0.6798, 'learning_rate': 1.0374e-05, 'epoch': 2.08, 'throughput': 2583.35}

06/17/2024 20:09:31 - INFO - llamafactory.extras.callbacks - {'loss': 0.6754, 'learning_rate': 8.7157e-06, 'epoch': 2.16, 'throughput': 2584.30}

06/17/2024 20:10:10 - INFO - llamafactory.extras.callbacks - {'loss': 0.6708, 'learning_rate': 7.1737e-06, 'epoch': 2.24, 'throughput': 2584.52}

06/17/2024 20:10:46 - INFO - llamafactory.extras.callbacks - {'loss': 0.6386, 'learning_rate': 5.7587e-06, 'epoch': 2.32, 'throughput': 2587.95}

06/17/2024 20:11:24 - INFO - llamafactory.extras.callbacks - {'loss': 0.6995, 'learning_rate': 4.4809e-06, 'epoch': 2.40, 'throughput': 2590.06}

06/17/2024 20:12:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.6691, 'learning_rate': 3.3494e-06, 'epoch': 2.48, 'throughput': 2593.15}

06/17/2024 20:12:50 - INFO - llamafactory.extras.callbacks - {'loss': 0.6024, 'learning_rate': 2.3721e-06, 'epoch': 2.56, 'throughput': 2588.49}

06/17/2024 20:13:28 - INFO - llamafactory.extras.callbacks - {'loss': 0.6484, 'learning_rate': 1.5562e-06, 'epoch': 2.64, 'throughput': 2591.26}

06/17/2024 20:14:10 - INFO - llamafactory.extras.callbacks - {'loss': 0.7137, 'learning_rate': 9.0736e-07, 'epoch': 2.72, 'throughput': 2585.69}

06/17/2024 20:14:56 - INFO - llamafactory.extras.callbacks - {'loss': 0.7770, 'learning_rate': 4.3025e-07, 'epoch': 2.80, 'throughput': 2589.29}

06/17/2024 20:15:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.6529, 'learning_rate': 1.2827e-07, 'epoch': 2.88, 'throughput': 2590.50}

06/17/2024 20:16:16 - INFO - llamafactory.extras.callbacks - {'loss': 0.6996, 'learning_rate': 3.5659e-09, 'epoch': 2.96, 'throughput': 2590.41}

06/17/2024 20:16:26 - INFO - transformers.trainer - 

Training completed. Do not forget to share your model on huggingface.co/models =)


06/17/2024 20:16:26 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05

06/17/2024 20:16:27 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--Qwen--Qwen2-1.5B-Instruct/snapshots/ba1cf1846d7df0a0591d6c00649f57e798519da8/config.json

06/17/2024 20:16:27 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 1536,
  "initializer_range": 0.02,
  "intermediate_size": 8960,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 12,
  "num_hidden_layers": 28,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}


06/17/2024 20:16:27 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/tokenizer_config.json

06/17/2024 20:16:27 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-1.5B-Chat/lora/train_2024-06-17-19-49-05/special_tokens_map.json

06/17/2024 20:16:27 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.

06/17/2024 20:16:27 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}