sqllama
/

lora-sql-context-dono

Model card Files Files and versions Community

chrisdono commited on Apr 29, 2023

Commit

16a3ff9

•

1 Parent(s): 1d7506e

initial model

Browse files

Files changed (19) hide show

README.md +156 -0
adapter_config.json +18 -0
adapter_model.bin +3 -0
checkpoint-200/optimizer.pt +3 -0
checkpoint-200/pytorch_model.bin +3 -0
checkpoint-200/rng_state_0.pth +3 -0
checkpoint-200/rng_state_1.pth +3 -0
checkpoint-200/scaler.pt +3 -0
checkpoint-200/scheduler.pt +3 -0
checkpoint-200/trainer_state.json +144 -0
checkpoint-200/training_args.bin +3 -0
checkpoint-400/optimizer.pt +3 -0
checkpoint-400/pytorch_model.bin +3 -0
checkpoint-400/rng_state_0.pth +3 -0
checkpoint-400/rng_state_1.pth +3 -0
checkpoint-400/scaler.pt +3 -0
checkpoint-400/scheduler.pt +3 -0
checkpoint-400/trainer_state.json +272 -0
checkpoint-400/training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,156 @@

+## Setup Notes
+For this model, a VM with 2 T4 GPUs was used.
+To get the training to work on the 2 GPUs (utilize both GPUS simultaneously), the following command was used to initiate training.
+WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16
+Note 1. Micro batch size was increased from the default 4 to 16.  Note that increasing it further is possible based on other training that has been performed.  This was a first attempt.
+Note 2. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository.
+## Log
+(sqltest) chrisdono@deep-learning-duo-t4-3:~/alpaca-lora$ WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/lla$
+a-7b-hf' --data_path 'b-mc2/sql-create-context' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 16
+WARNING:torch.distributed.run:
+*****************************************
+Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your appli
+cation as needed.
+*****************************************
+===================================BUG REPORT===================================
+Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
+================================================================================
+===================================BUG REPORT===================================
+Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
+================================================================================
+/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
+s...
+  warn(msg)
+CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
+CUDA SETUP: Highest compute capability among GPUs detected: 7.5
+CUDA SETUP: Detected CUDA version 113
+CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
+/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path
+s...
+  warn(msg)
+CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
+CUDA SETUP: Highest compute capability among GPUs detected: 7.5
+CUDA SETUP: Detected CUDA version 113
+CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
+Training Alpaca-LoRA model with params:
+base_model: decapoda-research/llama-7b-hf
+data_path: b-mc2/sql-create-context
+output_dir: ./lora-alpaca
+batch_size: 128
+micro_batch_size: 16
+num_epochs: 1
+learning_rate: 0.0003
+cutoff_len: 256
+val_set_size: 2000
+lora_r: 8
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_modules: ['q_proj', 'v_proj']
+train_on_inputs: True
+add_eos_token: False
+group_by_length: False
+wandb_project:
+wandb_run_name:
+wandb_watch:
+wandb_log_model:
+resume_from_checkpoint: False
+prompt template: alpaca
+Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:24<00:00,  2.57s/it]
+Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:24<00:00,  2.57s/it]
+The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
+The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
+The class this function is called from is 'LlamaTokenizer'.
+The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
+The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
+The class this function is called from is 'LlamaTokenizer'.
+Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e
+233e6e)
+  0%|                                                                                                                                                                    | 0/1 [00:00<?, ?it/s]
+Found cached dataset json (/home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e
+233e6e)
+100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.30it/s]
+100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.83it/s]
+trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
+trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
+Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b
+2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5
+50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow
+Loading cached split indices for dataset at /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b
+2dd7af1cf934bed8e233e6e/cache-5a5ac0bd39fc20e0.arrow and /home/chrisdono/.cache/huggingface/datasets/b-mc2___json/b-mc2--sql-create-context-d62c31544f758e00/0.0.0/fe5dd6ea2639a6df622901539cb5
+50cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-782fec259d4b8f6a.arrow
+{'loss': 2.7003, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.02}
+{'loss': 2.566, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.03}
+{'loss': 2.2648, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.05}
+{'loss': 1.657, 'learning_rate': 0.00011099999999999999, 'epoch': 0.07}
+{'loss': 1.1599, 'learning_rate': 0.00014099999999999998, 'epoch': 0.08}
+{'loss': 0.9037, 'learning_rate': 0.00017099999999999998, 'epoch': 0.1}
+{'loss': 0.8137, 'learning_rate': 0.000201, 'epoch': 0.12}
+{'loss': 0.7827, 'learning_rate': 0.00023099999999999998, 'epoch': 0.13}
+{'loss': 0.7554, 'learning_rate': 0.000261, 'epoch': 0.15}
+{'loss': 0.7357, 'learning_rate': 0.00029099999999999997, 'epoch': 0.17}
+{'loss': 0.6893, 'learning_rate': 0.0002957831325301205, 'epoch': 0.18}
+{'loss': 0.6606, 'learning_rate': 0.00028975903614457827, 'epoch': 0.2}
+{'loss': 0.6506, 'learning_rate': 0.0002837349397590361, 'epoch': 0.22}
+{'loss': 0.6462, 'learning_rate': 0.00027771084337349395, 'epoch': 0.23}                                                                                                             [215/1857]
+{'loss': 0.6315, 'learning_rate': 0.0002716867469879518, 'epoch': 0.25}
+{'loss': 0.6337, 'learning_rate': 0.0002656626506024096, 'epoch': 0.27}
+{'loss': 0.6223, 'learning_rate': 0.00025963855421686746, 'epoch': 0.28}
+{'loss': 0.6136, 'learning_rate': 0.00025361445783132525, 'epoch': 0.3}
+{'loss': 0.6198, 'learning_rate': 0.00024759036144578314, 'epoch': 0.32}
+{'loss': 0.6084, 'learning_rate': 0.00024156626506024095, 'epoch': 0.33}
+{'eval_loss': 0.608456552028656, 'eval_runtime': 123.856, 'eval_samples_per_second': 16.148, 'eval_steps_per_second': 1.009, 'epoch': 0.33}
+{'loss': 0.6021, 'learning_rate': 0.00023554216867469876, 'epoch': 0.35}
+{'loss': 0.5949, 'learning_rate': 0.0002295180722891566, 'epoch': 0.37}
+{'loss': 0.5972, 'learning_rate': 0.00022349397590361444, 'epoch': 0.38}
+{'loss': 0.5922, 'learning_rate': 0.00021746987951807228, 'epoch': 0.4}
+{'loss': 0.5876, 'learning_rate': 0.0002114457831325301, 'epoch': 0.42}
+{'loss': 0.5788, 'learning_rate': 0.00020542168674698793, 'epoch': 0.43}
+{'loss': 0.5894, 'learning_rate': 0.0001993975903614458, 'epoch': 0.45}
+{'loss': 0.5877, 'learning_rate': 0.0001933734939759036, 'epoch': 0.47}
+{'loss': 0.5835, 'learning_rate': 0.00018734939759036142, 'epoch': 0.48}
+{'loss': 0.5791, 'learning_rate': 0.00018132530120481925, 'epoch': 0.5}
+{'loss': 0.5841, 'learning_rate': 0.00017530120481927712, 'epoch': 0.52}
+{'loss': 0.5728, 'learning_rate': 0.00016927710843373493, 'epoch': 0.53}
+{'loss': 0.569, 'learning_rate': 0.00016325301204819274, 'epoch': 0.55}
+{'loss': 0.5709, 'learning_rate': 0.00015722891566265058, 'epoch': 0.57}
+{'loss': 0.5762, 'learning_rate': 0.00015120481927710845, 'epoch': 0.58}
+{'loss': 0.5704, 'learning_rate': 0.00014518072289156626, 'epoch': 0.6}
+{'loss': 0.5661, 'learning_rate': 0.0001391566265060241, 'epoch': 0.62}
+{'loss': 0.5662, 'learning_rate': 0.00013313253012048193, 'epoch': 0.63}
+{'loss': 0.5674, 'learning_rate': 0.00012710843373493975, 'epoch': 0.65}
+{'loss': 0.5635, 'learning_rate': 0.00012108433734939758, 'epoch': 0.67}
+{'eval_loss': 0.568750262260437, 'eval_runtime': 122.9061, 'eval_samples_per_second': 16.273, 'eval_steps_per_second': 1.017, 'epoch': 0.67}
+{'loss': 0.5609, 'learning_rate': 0.00011506024096385541, 'epoch': 0.69}
+{'loss': 0.5724, 'learning_rate': 0.00010903614457831325, 'epoch': 0.7}
+{'loss': 0.5603, 'learning_rate': 0.00010301204819277107, 'epoch': 0.72}
+{'loss': 0.5599, 'learning_rate': 9.698795180722891e-05, 'epoch': 0.74}
+{'loss': 0.5655, 'learning_rate': 9.096385542168674e-05, 'epoch': 0.75}
+{'loss': 0.5578, 'learning_rate': 8.493975903614457e-05, 'epoch': 0.77}
+{'loss': 0.5577, 'learning_rate': 7.89156626506024e-05, 'epoch': 0.79}
+{'loss': 0.5606, 'learning_rate': 7.289156626506024e-05, 'epoch': 0.8}
+{'loss': 0.5496, 'learning_rate': 6.686746987951806e-05, 'epoch': 0.82}
+{'loss': 0.5635, 'learning_rate': 6.08433734939759e-05, 'epoch': 0.84}
+{'loss': 0.5522, 'learning_rate': 5.481927710843373e-05, 'epoch': 0.85}
+{'loss': 0.5572, 'learning_rate': 4.879518072289156e-05, 'epoch': 0.87}
+{'loss': 0.5454, 'learning_rate': 4.2771084337349395e-05, 'epoch': 0.89}
+{'loss': 0.5485, 'learning_rate': 3.6746987951807227e-05, 'epoch': 0.9}
+{'loss': 0.5592, 'learning_rate': 3.072289156626506e-05, 'epoch': 0.92}
+{'loss': 0.5499, 'learning_rate': 2.469879518072289e-05, 'epoch': 0.94}
+{'loss': 0.55, 'learning_rate': 1.867469879518072e-05, 'epoch': 0.95}
+{'loss': 0.5511, 'learning_rate': 1.2650602409638553e-05, 'epoch': 0.97}
+{'loss': 0.5531, 'learning_rate': 6.626506024096385e-06, 'epoch': 0.99}
+100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 598/598 [4:45:30<00:00, 27.59s/it]
+{'train_runtime': 17131.1027, 'train_samples_per_second': 4.47, 'train_steps_per_second': 0.035, 'train_loss': 0.7246327424129116, 'epoch': 1.0}
+100%|████████████████████████████████████████████████████████��█████████████████████████████████████████████████████████████████████████████████████████████| 598/598 [4:45:30<00:00, 28.65s/it]

adapter_config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "base_model_name_or_path": "decapoda-research/llama-7b-hf",
+  "bias": "none",
+  "enable_lora": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "merge_weights": false,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e42555cfb90f4ae1dea4f60b1fabeb4b0f835742f7ce5cb5db85a88b9bd56ab1
+size 16822989

checkpoint-200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:30aed971820530f6c2286b67813628eb04406f7a8ff8684a5c487d4703ff526a
+size 33661637

checkpoint-200/pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:116aaa0e06aeb94d0275cb05d94d3de6ffa0777157bdb82c747abb8d64ce1e9e
+size 16822989

checkpoint-200/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e16e9f14f9e54b23b5659b8c0c8de15af3106d2fd7100c84972de9f0edddff5a
+size 14583

checkpoint-200/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:771e4f2aee11e3618b4bdba3329b0dd8fffdec0bc052288b1bac58fc0f1db62b
+size 14583

checkpoint-200/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:915e54e5670dd4bd7730bf2512f0efeebe5737f3917cb2161e14f0fc5045e265
+size 557

checkpoint-200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2c3951795851fd7473327989d296caf1d37edcfc8f96fd0f7135134b43e8e969
+size 627

checkpoint-200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,144 @@

+{
+  "best_metric": 0.608456552028656,
+  "best_model_checkpoint": "./lora-alpaca/checkpoint-200",
+  "epoch": 0.3341687552213868,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02,
+      "learning_rate": 2.9999999999999997e-05,
+      "loss": 2.7003,
+      "step": 10
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 5.9999999999999995e-05,
+      "loss": 2.566,
+      "step": 20
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 8.999999999999999e-05,
+      "loss": 2.2648,
+      "step": 30
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 0.00011099999999999999,
+      "loss": 1.657,
+      "step": 40
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 0.00014099999999999998,
+      "loss": 1.1599,
+      "step": 50
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 0.00017099999999999998,
+      "loss": 0.9037,
+      "step": 60
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 0.000201,
+      "loss": 0.8137,
+      "step": 70
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 0.00023099999999999998,
+      "loss": 0.7827,
+      "step": 80
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 0.000261,
+      "loss": 0.7554,
+      "step": 90
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 0.00029099999999999997,
+      "loss": 0.7357,
+      "step": 100
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 0.0002957831325301205,
+      "loss": 0.6893,
+      "step": 110
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 0.00028975903614457827,
+      "loss": 0.6606,
+      "step": 120
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 0.0002837349397590361,
+      "loss": 0.6506,
+      "step": 130
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 0.00027771084337349395,
+      "loss": 0.6462,
+      "step": 140
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 0.0002716867469879518,
+      "loss": 0.6315,
+      "step": 150
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 0.0002656626506024096,
+      "loss": 0.6337,
+      "step": 160
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 0.00025963855421686746,
+      "loss": 0.6223,
+      "step": 170
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 0.00025361445783132525,
+      "loss": 0.6136,
+      "step": 180
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 0.00024759036144578314,
+      "loss": 0.6198,
+      "step": 190
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 0.00024156626506024095,
+      "loss": 0.6084,
+      "step": 200
+    },
+    {
+      "epoch": 0.33,
+      "eval_loss": 0.608456552028656,
+      "eval_runtime": 123.856,
+      "eval_samples_per_second": 16.148,
+      "eval_steps_per_second": 1.009,
+      "step": 200
+    }
+  ],
+  "max_steps": 598,
+  "num_train_epochs": 1,
+  "total_flos": 1.7194991763849216e+17,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84c05e6ae753d45577a258d454e2eb5c09dc6745e975434d69ba5432ffa022e7
+size 3579

checkpoint-400/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a64a61a46978c6b08f923fbc3765ae8f2425a7f416cdf7d53ff324dbaafa95df
+size 33661637

checkpoint-400/pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:583547922fd1f6caffa9b89e8e9198da03a6edbb83afc3a4d197a14afebcfda6
+size 16822989

checkpoint-400/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b82e02b2fa07ca70b10b69c969d68a430593f680faba711e125cd22f71d22091
+size 14583

checkpoint-400/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad758f3149c3dc3140b14c686eb5e4c86f1100296833b4d78d25050a3488c2ba
+size 14583

checkpoint-400/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4f5554cdd695760baf91670b248e922967e34f8996685d064b22c06d043999e2
+size 557

checkpoint-400/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1720e92837945289e11edece0320c6d9f5e447f776cad627cbb852f656661b20
+size 627

checkpoint-400/trainer_state.json ADDED Viewed

	@@ -0,0 +1,272 @@

+{
+  "best_metric": 0.568750262260437,
+  "best_model_checkpoint": "./lora-alpaca/checkpoint-400",
+  "epoch": 0.6683375104427736,
+  "global_step": 400,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02,
+      "learning_rate": 2.9999999999999997e-05,
+      "loss": 2.7003,
+      "step": 10
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 5.9999999999999995e-05,
+      "loss": 2.566,
+      "step": 20
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 8.999999999999999e-05,
+      "loss": 2.2648,
+      "step": 30
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 0.00011099999999999999,
+      "loss": 1.657,
+      "step": 40
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 0.00014099999999999998,
+      "loss": 1.1599,
+      "step": 50
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 0.00017099999999999998,
+      "loss": 0.9037,
+      "step": 60
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 0.000201,
+      "loss": 0.8137,
+      "step": 70
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 0.00023099999999999998,
+      "loss": 0.7827,
+      "step": 80
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 0.000261,
+      "loss": 0.7554,
+      "step": 90
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 0.00029099999999999997,
+      "loss": 0.7357,
+      "step": 100
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 0.0002957831325301205,
+      "loss": 0.6893,
+      "step": 110
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 0.00028975903614457827,
+      "loss": 0.6606,
+      "step": 120
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 0.0002837349397590361,
+      "loss": 0.6506,
+      "step": 130
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 0.00027771084337349395,
+      "loss": 0.6462,
+      "step": 140
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 0.0002716867469879518,
+      "loss": 0.6315,
+      "step": 150
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 0.0002656626506024096,
+      "loss": 0.6337,
+      "step": 160
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 0.00025963855421686746,
+      "loss": 0.6223,
+      "step": 170
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 0.00025361445783132525,
+      "loss": 0.6136,
+      "step": 180
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 0.00024759036144578314,
+      "loss": 0.6198,
+      "step": 190
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 0.00024156626506024095,
+      "loss": 0.6084,
+      "step": 200
+    },
+    {
+      "epoch": 0.33,
+      "eval_loss": 0.608456552028656,
+      "eval_runtime": 123.856,
+      "eval_samples_per_second": 16.148,
+      "eval_steps_per_second": 1.009,
+      "step": 200
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 0.00023554216867469876,
+      "loss": 0.6021,
+      "step": 210
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 0.0002295180722891566,
+      "loss": 0.5949,
+      "step": 220
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 0.00022349397590361444,
+      "loss": 0.5972,
+      "step": 230
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 0.00021746987951807228,
+      "loss": 0.5922,
+      "step": 240
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 0.0002114457831325301,
+      "loss": 0.5876,
+      "step": 250
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 0.00020542168674698793,
+      "loss": 0.5788,
+      "step": 260
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 0.0001993975903614458,
+      "loss": 0.5894,
+      "step": 270
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 0.0001933734939759036,
+      "loss": 0.5877,
+      "step": 280
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 0.00018734939759036142,
+      "loss": 0.5835,
+      "step": 290
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 0.00018132530120481925,
+      "loss": 0.5791,
+      "step": 300
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 0.00017530120481927712,
+      "loss": 0.5841,
+      "step": 310
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 0.00016927710843373493,
+      "loss": 0.5728,
+      "step": 320
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 0.00016325301204819274,
+      "loss": 0.569,
+      "step": 330
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 0.00015722891566265058,
+      "loss": 0.5709,
+      "step": 340
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 0.00015120481927710845,
+      "loss": 0.5762,
+      "step": 350
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 0.00014518072289156626,
+      "loss": 0.5704,
+      "step": 360
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 0.0001391566265060241,
+      "loss": 0.5661,
+      "step": 370
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 0.00013313253012048193,
+      "loss": 0.5662,
+      "step": 380
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 0.00012710843373493975,
+      "loss": 0.5674,
+      "step": 390
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 0.00012108433734939758,
+      "loss": 0.5635,
+      "step": 400
+    },
+    {
+      "epoch": 0.67,
+      "eval_loss": 0.568750262260437,
+      "eval_runtime": 122.9061,
+      "eval_samples_per_second": 16.273,
+      "eval_steps_per_second": 1.017,
+      "step": 400
+    }
+  ],
+  "max_steps": 598,
+  "num_train_epochs": 1,
+  "total_flos": 3.4431112456647475e+17,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-400/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84c05e6ae753d45577a258d454e2eb5c09dc6745e975434d69ba5432ffa022e7
+size 3579