--- library_name: peft tags: - generated_from_trainer base_model: /GenAI4HW/llama2_13b metrics: - accuracy model-index: - name: outputs/llama2-13B-lora-QuArch_0_1_1_alpaca_filtered-answer-context-test-new results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.0` ```yaml ## General # base_model: meta-llama/Meta-Llama-3-8B-Instruct base_model: /GenAI4HW/llama2_13b # base_model: meta-llama/Llama-2-13b model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer # tokenizer_type: LlamaTokenizer output_dir: ./outputs/llama2-13B-lora-QuArch_0_1_1_alpaca_filtered-answer-context-test-new seed: 42 ## Data Configuration datasets: # - path: ./data/QuArch_v0_1_0_alpaca_w_context.json # With abstract # - path: ./data/QuArch_v0_1_1_alpaca_format.json # With justification # - path: ./data/QuArch_v0_1_0_alpaca_mmlu.json # Without justification - path: ./data/QuArch_v0_1_1_alpaca_filtered_context/ type: alpaca data_file: train dataset_prepared_path: test_datasets: - path: ./data/QuArch_v0_1_1_alpaca_filtered_context/ type: alpaca split: test data_file: - test # - path: ./data/QuArch_v0_1_1_alpaca_filtered_context/ # type: alpaca # split: val # data_file: # - val ## Model Configuration load_in_8bit: false load_in_4bit: false strict: false bf16: auto fp16: tf32: false device_map: 'auto' ## LoRA Configuration adapter: lora lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_linear: true lora_model_dir: lora_fan_in_fan_out: ## Logging Configuration logging_dir: ./logs logging_steps: 10 wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model: do_eval: true ## Training Configuration sequence_len: 1024 sample_packing: true pad_to_sequence_len: true train_on_inputs: false group_by_length: false micro_batch_size: 1 gradient_accumulation_steps: 16 num_epochs: 30 warmup_steps: 10 weight_decay: 0.01 optimizer: adamw_torch lr_scheduler: linear learning_rate: 2e-5 gradient_checkpointing: false saves_per_epoch: 1 # save_steps: 0 # save_strategy: steps save_total_limit: 30 load_best_model_at_end: true greater_is_better: true early_stopping_patience: resume_from_checkpoint: remove_unused_columns: true ## Evaluation Configuration eval_sample_packing: False eval_batch_size: 1 evals_per_epoch: 1 # evaluation_strategy: epoch eval_max_new_tokens: 32 eval_table_size: # max_new_token: 32 # eval_causal_lm_metrics: sacrebleu # Others local_rank: xformers_attention: flash_attention: true s2_attention: debug: deepspeed: fsdp: fsdp_config: special_tokens: # pad_token: <|end_of_text|> ```

# outputs/llama2-13B-lora-QuArch_0_1_1_alpaca_filtered-answer-context-test-new This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.0432 - Accuracy: 0.9808 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 16 - total_train_batch_size: 32 - total_eval_batch_size: 2 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 10 - num_epochs: 30 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-------:|:----:|:---------------:|:--------:| | No log | 0.2105 | 1 | 5.1322 | 0.6154 | | No log | 0.8421 | 4 | 5.1271 | 0.6346 | | No log | 1.6842 | 8 | 5.0601 | 0.6538 | | 5.1323 | 2.5263 | 12 | 4.7743 | 0.7885 | | 5.1323 | 3.3684 | 16 | 4.0491 | 0.9231 | | 4.2735 | 4.2105 | 20 | 2.6444 | 0.8846 | | 4.2735 | 5.0526 | 24 | 1.0551 | 0.9615 | | 4.2735 | 5.8947 | 28 | 0.4698 | 0.6923 | | 1.2232 | 6.7368 | 32 | 0.3224 | 0.6731 | | 1.2232 | 7.5789 | 36 | 0.2527 | 1.0 | | 0.3083 | 8.4211 | 40 | 0.1972 | 1.0 | | 0.3083 | 9.2632 | 44 | 0.1372 | 0.9615 | | 0.3083 | 10.1053 | 48 | 0.0803 | 1.0 | | 0.1761 | 10.9474 | 52 | 0.0575 | 0.9808 | | 0.1761 | 11.7895 | 56 | 0.0475 | 0.9808 | | 0.116 | 12.6316 | 60 | 0.0444 | 0.9808 | | 0.116 | 13.4737 | 64 | 0.0463 | 0.9808 | | 0.116 | 14.3158 | 68 | 0.0489 | 0.9808 | | 0.0814 | 15.1579 | 72 | 0.0495 | 0.9808 | | 0.0814 | 16.0 | 76 | 0.0481 | 0.9808 | | 0.0709 | 16.8421 | 80 | 0.0469 | 0.9808 | | 0.0709 | 17.6842 | 84 | 0.0457 | 0.9808 | | 0.0709 | 18.5263 | 88 | 0.0455 | 0.9808 | | 0.0632 | 19.3684 | 92 | 0.0454 | 0.9808 | | 0.0632 | 20.2105 | 96 | 0.0459 | 0.9808 | | 0.0569 | 21.0526 | 100 | 0.0458 | 0.9808 | | 0.0569 | 21.8947 | 104 | 0.0446 | 0.9808 | | 0.0569 | 22.7368 | 108 | 0.0451 | 0.9808 | | 0.055 | 23.5789 | 112 | 0.0446 | 0.9808 | | 0.055 | 24.4211 | 116 | 0.0452 | 0.9808 | | 0.0581 | 25.2632 | 120 | 0.0432 | 0.9808 | ### Framework versions - PEFT 0.10.0 - Transformers 4.41.2 - Pytorch 2.1.2+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1