--- license: apache-2.0 base_model: h2oai/h2o-danube3-500m-base tags: - axolotl - generated_from_trainer model-index: - name: clite7-500m-test-ckpts results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml # Weights and Biases logging config wandb_project: clite wandb_entity: wandb_watch: wandb_name: v7 wandb_log_model: # Model architecture config base_model: h2oai/h2o-danube3-500m-base model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer chat_template: anthropic # Hugging Face saving config hub_model_id: hub_strategy: push_dataset_to_hub: hf_use_auth_token: # Model checkpointing config output_dir: ./lora-out resume_from_checkpoint: save_steps: saves_per_epoch: 5 save_safetensors: true save_total_limit: 2 # Mixed precision training config bf16: true fp16: false tf32: false # Model loading config load_in_8bit: false load_in_4bit: false strict: false # Sequence config sequence_len: 8192 s2_attention: false sample_packing: true eval_sample_packing: true pad_to_sequence_len: true train_on_inputs: true group_by_length: false # Dataset config datasets: - path: kalomaze/Opus_Instruct_3k type: chat_template val_set_size: 0.1 evaluation_strategy: eval_steps: evals_per_epoch: 10 test_datasets: dataset_prepared_path: ./last-preped-dataset shuffle_merged_datasets: true # Training hyperparameters num_epochs: 3 gradient_accumulation_steps: 2 micro_batch_size: 8 eval_batch_size: 8 warmup_steps: 10 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.00004 cosine_min_lr_ratio: 0.1 weight_decay: 0.1 max_grad_norm: 1 logging_steps: 1 # Model optimization gradient_checkpointing: unsloth xformers_attention: false flash_attention: true sdp_attention: false unsloth_cross_entropy_loss: false unsloth_lora_mlp: false unsloth_lora_qkv: false unsloth_lora_o: false # Loss monitoring config early_stopping_patience: false loss_watchdog_threshold: 100.0 loss_watchdog_patience: 3 # Debug config debug: true seed: 02496 # DeepSpeed and FSDP config deepspeed: fsdp: fsdp_config: # Token config special_tokens: tokens: # these are delimiters - "" # Checkpoint backing up hub_model_id: Fizzarolli/clite7-500m-test-ckpts hub_strategy: all_checkpoints ```

[Visualize in Weights & Biases](https://wandb.ai/ruthenic/clite/runs/diil6zl9) # clite7-500m-test-ckpts This model is a fine-tuned version of [h2oai/h2o-danube3-500m-base](https://huggingface.co/h2oai/h2o-danube3-500m-base) on the None dataset. It achieves the following results on the evaluation set: - Loss: 1.3765 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 4e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 2496 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 2.9517 | 0.0952 | 1 | 3.7616 | | 2.9796 | 0.1905 | 2 | 3.6462 | | 2.9632 | 0.2857 | 3 | 3.3357 | | 2.6639 | 0.3810 | 4 | 3.0408 | | 2.5048 | 0.4762 | 5 | 2.7322 | | 2.4911 | 0.5714 | 6 | 2.5094 | | 2.1291 | 0.6667 | 7 | 2.3554 | | 4.8452 | 0.7619 | 8 | 1.6418 | | 1.6902 | 0.8571 | 9 | 1.6067 | | 1.6166 | 0.9524 | 10 | 1.5581 | | 1.5985 | 1.0476 | 11 | 1.5162 | | 1.5001 | 1.0476 | 12 | 1.4847 | | 1.4679 | 1.1429 | 13 | 1.4601 | | 1.4981 | 1.2381 | 14 | 1.4440 | | 1.4864 | 1.3333 | 15 | 1.4293 | | 1.4895 | 1.4286 | 16 | 1.4174 | | 1.4653 | 1.5238 | 17 | 1.4061 | | 1.4447 | 1.6190 | 18 | 1.3988 | | 1.4492 | 1.7143 | 19 | 1.3937 | | 1.4244 | 1.8095 | 20 | 1.3896 | | 1.4319 | 1.9048 | 21 | 1.3858 | | 1.4238 | 2.0 | 22 | 1.3830 | | 1.4725 | 2.0952 | 23 | 1.3810 | | 1.3862 | 2.0952 | 24 | 1.3794 | | 1.3526 | 2.1905 | 25 | 1.3783 | | 1.4134 | 2.2857 | 26 | 1.3776 | | 1.3909 | 2.3810 | 27 | 1.3771 | | 1.4016 | 2.4762 | 28 | 1.3769 | | 1.3494 | 2.5714 | 29 | 1.3766 | | 1.3783 | 2.6667 | 30 | 1.3765 | ### Framework versions - Transformers 4.42.4 - Pytorch 2.1.2+cu118 - Datasets 2.19.1 - Tokenizers 0.19.1