Setup Notes

For this model, a VM with 2 T4 GPUs was used.

To get the training to work on the 2 GPUs (utilize both GPUS simultaneously), the following command was used to initiate training.

WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=1234 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'wikisql' --output_dir './lora-alpaca' --num_epochs 1 --micro_batch_size 32

Note 1. Micro batch size was increased from the default 4 to 32.

Note 2. Output directory was initially lora-alpaca and then contents were moved to new folder when initializing git repository.

Log

issues

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path s...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
/opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/sqltest did not contain libcudart.so as expected! Searching further path s...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /opt/conda/envs/sqltest/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Training Alpaca-LoRA model with params:
base_model: decapoda-research/llama-7b-hf
data_path: wikisql
output_dir: ./lora-alpaca
batch_size: 128
micro_batch_size: 32
num_epochs: 1
learning_rate: 0.0003
cutoff_len: 256
val_set_size: 2000
lora_r: 8
lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['q_proj', 'v_proj'] train_on_inputs: True add_eos_token: False group_by_length: False wandb_project: wandb_run_name: wandb_watch: wandb_log_model: resume_from_checkpoint: False prompt template: alpaca

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:24<00:00, 2.57s/it] Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:25<00:00, 2.58s/it] The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. Found cached dataset wikisql (/home/chrisdono4/.cache/huggingface/datasets/wikisql/default/0.1.0/7037bfe6a42b1ca2b6ac3ccacba5253b1825d31379e9cc626fc79a620977252d) 0%| | 0/3 [00:00<?, ?it/s] Found cached dataset wikisql (/home/chrisdono4/.cache/huggingface/datasets/wikisql/default/0.1.0/7037bfe6a42b1ca2b6ac3ccacba5253b1825d31379e9cc626fc79a620977252d) 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 39.74it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 26.05it/s] trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 Loading cached split indices for dataset at /home/chrisdono4/.cache/huggingface/datasets/wikisql/default/0.1.0/7037bfe6a42b1ca2b6ac3ccacba5253b1825d31379e9cc626fc79a620977252d/cache-bccdadf40 48a2d5b.arrow and /home/chrisdono4/.cache/huggingface/datasets/wikisql/default/0.1.0/7037bfe6a42b1ca2b6ac3ccacba5253b1825d31379e9cc626fc79a620977252d/cache-f8d5ea283d842b5a.arrow Loading cached split indices for dataset at /home/chrisdono4/.cache/huggingface/datasets/wikisql/default/0.1.0/7037bfe6a42b1ca2b6ac3ccacba5253b1825d31379e9cc626fc79a620977252d/cache-bccdadf40 48a2d5b.arrow and /home/chrisdono4/.cache/huggingface/datasets/wikisql/default/0.1.0/7037bfe6a42b1ca2b6ac3ccacba5253b1825d31379e9cc626fc79a620977252d/cache-f8d5ea283d842b5a.arrow {'loss': 2.0163, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.02}
{'loss': 1.9284, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.05}
{'loss': 1.77, 'learning_rate': 8.999999999999999e-05, 'epoch': 0.07}
{'loss': 1.3452, 'learning_rate': 0.00011999999999999999, 'epoch': 0.09}
{'loss': 0.9243, 'learning_rate': 0.00015, 'epoch': 0.12}
{'loss': 0.8385, 'learning_rate': 0.00017999999999999998, 'epoch': 0.14}
{'loss': 0.7986, 'learning_rate': 0.00020999999999999998, 'epoch': 0.16}
{'loss': 0.7786, 'learning_rate': 0.00023999999999999998, 'epoch': 0.19}
{'loss': 0.75, 'learning_rate': 0.00027, 'epoch': 0.21}
{'loss': 0.7389, 'learning_rate': 0.0003, 'epoch': 0.24}
{'loss': 0.7248, 'learning_rate': 0.00029076923076923073, 'epoch': 0.26}
{'loss': 0.7199, 'learning_rate': 0.0002815384615384615, 'epoch': 0.28}
{'loss': 0.7159, 'learning_rate': 0.0002723076923076923, 'epoch': 0.31}
{'loss': 0.7029, 'learning_rate': 0.00026307692307692306, 'epoch': 0.33} {'loss': 0.6851, 'learning_rate': 0.0002538461538461538, 'epoch': 0.35}
{'loss': 0.6935, 'learning_rate': 0.0002446153846153846, 'epoch': 0.38}
{'loss': 0.6737, 'learning_rate': 0.00023538461538461536, 'epoch': 0.4}
{'loss': 0.682, 'learning_rate': 0.00022615384615384614, 'epoch': 0.42}
{'loss': 0.667, 'learning_rate': 0.0002169230769230769, 'epoch': 0.45}
{'loss': 0.6731, 'learning_rate': 0.00020769230769230766, 'epoch': 0.47}
{'eval_loss': 0.6641973853111267, 'eval_runtime': 178.902, 'eval_samples_per_second': 11.179, 'eval_steps_per_second': 0.699, 'epoch': 0.47}
{'loss': 0.6631, 'learning_rate': 0.00019846153846153844, 'epoch': 0.49}
{'loss': 0.6652, 'learning_rate': 0.0001892307692307692, 'epoch': 0.52}
{'loss': 0.6591, 'learning_rate': 0.00017999999999999998, 'epoch': 0.54}
{'loss': 0.6605, 'learning_rate': 0.00017076923076923074, 'epoch': 0.56}
{'loss': 0.653, 'learning_rate': 0.00016153846153846153, 'epoch': 0.59}
{'loss': 0.6574, 'learning_rate': 0.00015230769230769228, 'epoch': 0.61}
{'loss': 0.6545, 'learning_rate': 0.00014307692307692307, 'epoch': 0.64}
{'loss': 0.6328, 'learning_rate': 0.00013384615384615385, 'epoch': 0.66}
{'loss': 0.6485, 'learning_rate': 0.0001246153846153846, 'epoch': 0.68}
{'loss': 0.6477, 'learning_rate': 0.00011538461538461538, 'epoch': 0.71}
{'loss': 0.639, 'learning_rate': 0.00010615384615384615, 'epoch': 0.73}
{'loss': 0.6384, 'learning_rate': 9.692307692307692e-05, 'epoch': 0.75}
{'loss': 0.6338, 'learning_rate': 8.76923076923077e-05, 'epoch': 0.78}
{'loss': 0.6394, 'learning_rate': 7.846153846153845e-05, 'epoch': 0.8}
82%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 348/425 [3:57:23<51:48, 40.37s/it] {'loss': 0.6345, 'learning_rate': 6.923076923076922e-05, 'epoch': 0.82}
{'loss': 0.6424, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.85}
{'loss': 0.6271, 'learning_rate': 5.0769230769230766e-05, 'epoch': 0.87}
{'loss': 0.6267, 'learning_rate': 4.153846153846154e-05, 'epoch': 0.89}
{'loss': 0.642, 'learning_rate': 3.230769230769231e-05, 'epoch': 0.92}
{'loss': 0.6389, 'learning_rate': 2.3076923076923076e-05, 'epoch': 0.94}
{'eval_loss': 0.6302221417427063, 'eval_runtime': 177.453, 'eval_samples_per_second': 11.271, 'eval_steps_per_second': 0.704, 'epoch': 0.94}
{'loss': 0.6224, 'learning_rate': 1.3846153846153845e-05, 'epoch': 0.96}
{'loss': 0.6361, 'learning_rate': 4.615384615384615e-06, 'epoch': 0.99}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 425/425 [4:52:00<00:00, 36.53s/it] {'train_runtime': 17520.706, 'train_samples_per_second': 3.102, 'train_steps_per_second': 0.024, 'train_loss': 0.7834248065948486, 'epoch': 1.0}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 425/425 [4:52:00<00:00, 41.22s/it]

Setup Notes

Log

===================================BUG REPORT===================================Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues