--- license: apache-2.0 base_model: Afterparty-hf/pretrain-0.924 tags: - axolotl - generated_from_trainer model-index: - name: finetune-0.559 results: [] --- [

](https://github.com/OpenAccess-AI-Collective/axolotl)

See axolotl config

axolotl version: `0.4.1` ```yaml base_model: Afterparty-hf/pretrain-0.924 load_in_8bit: false load_in_4bit: false strict: false datasets: - path: Afterparty-hf/synthetic-instruct type: sharegpt - path: Afterparty-hf/train-format-server type: sharegpt - path: Afterparty-hf/help-channels-formatted type: sharegpt - path: Afterparty-hf/constt-augmented type: sharegpt - path: Afterparty-hf/transcripts-train type: sharegpt chat_template: chatml dataset_prepared_path: ./prepath hub_model_id: Afterparty-hf/finetune-0.559 wandb_project: ap_publi hf_use_auth_token: true output_dir: ./finetune-559-a resume_from_checkpoint: ./finetune-559/checkpoint-1026 wandb_watch: all hub_private_repo: true hub_strategy: all_checkpoints push_to_hub: false hf_use_auth_token: true max_grad_norm: 0.6 sequence_len: 14256 sample_packing: true pad_to_sequence_len: true micro_batch_size: 1 gradient_accumulation_steps: 1 num_epochs: 4 learning_rate: 0.000004 optimizer: adamw_bnb_8bit #optim_args: # amsgrad: true lr_scheduler: cosine train_on_inputs: false group_by_length: false bfloat16: false #bf16: auto fp16: tf32: false neftune_noise_alpha: 2 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: true logging_steps: 1 xformers_attention: flash_attention: true #unsloth_lora_mlp: true #unsloth_lora_qkv: true #unsloth_lora_o: true #flash_attn_cross_entropy: true #flash_attn_rms_norm: true #flash_attn_fuse_qkv: false #flash_attn_fuse_mlp: true warmup_ratio: 0.5 evals_per_step: 0.025 eval_table_size: saves_per_epoch: 5 debug: torch_compile: true rank: deepspeed: deepspeed_configs/zero2.json save_safetensors: true weight_decay: 0.01 special_tokens: bos_token: "~~" eos_token: "~~" unk_token: "" pad_token: "" tokens: # these are delimiters - "<|im_start|>" - "<|im_end|>" ```

# finetune-0.559 This model is a fine-tuned version of [Afterparty-hf/pretrain-0.924](https://huggingface.co/Afterparty-hf/pretrain-0.924) on the None dataset. ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 4e-06 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - total_train_batch_size: 8 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 310 - num_epochs: 4 ### Training results ### Framework versions - Transformers 4.41.1 - Pytorch 2.1.2+cu118 - Datasets 2.19.1 - Tokenizers 0.19.1