--- license: apache-2.0 datasets: - teknium/OpenHermes-2.5 - abhinand/ultrachat_200k_sharegpt language: - en --- # TinyLLaMA OpenHermes2.5 [Work in Progress] (Quantized) This a finetune of TinyLLaMA base model finetuned on [OpenHermes 2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) and [UltraChat 200k](https://huggingface.co/datasets/abhinand/ultrachat_200k_sharegpt) for a single epoch. Training was generously supported by [Jarvislabs.ai](https://jarvislabs.ai/). If you appreciate this work and would like to support its continued development, consider [buying me a coffee](https://www.buymeacoffee.com/abhinand.b). Your support is invaluable and greatly appreciated. [!["Buy Me A Coffee"](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/abhinand.b)
See axolotl config axolotl version: `0.4.0` ```yaml base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer trust_remote_code: true is_llama_derived_model: true # huggingface repo datasets: - path: teknium/OpenHermes-2.5 type: sharegpt conversation: chatml train_on_split: train - path: abhinand/ultrachat_200k_sharegpt type: sharegpt conversation: chatml train_on_split: train load_in_4bit: false load_in_8bit: false bf16: true # require >=ampere chat_template: chatml dataset_prepared_path: last_run_prepared_path hub_model_id: abhinand/TinyLlama-1.1B-OpenHermes-2.5-Chat-v1.0 group_by_length: false val_set_size: 0.0 sequence_len: 2048 sample_packing: true pad_to_sequence_len: true adapter: lora lora_model_dir: lora_r: 32 lora_alpha: 16 lora_target_modules: - q_proj - v_proj - k_proj - o_proj - gate_proj - down_proj - up_proj lora_modules_to_save: - embed_tokens - lm_head lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: output_dir: /home/tiny-llama/trained_models gradient_accumulation_steps: 2 micro_batch_size: 32 eval_batch_size: 32 num_epochs: 1 logging_steps: 1 save_steps: 50 save_total_limit: 3 save_safetensors: true gradient_checkpointing: true lr_scheduler: cosine optimizer: "adamw_bnb_8bit" adam_beta2: 0.95 adam_epsilon: 0.00001 weight_decay: 0.1 learning_rate: 0.0005 max_grad_norm: 1.0 warmup_ratio: 0.05 # warmup_steps: 100 flash_attention: true # Resume from a specific checkpoint dir resume_from_checkpoint: # If resume_from_checkpoint isn't set and you simply want it to start where it left off. # Be careful with this being turned on between different models. # auto_resume_from_checkpoints: true # wandb configuration if you're using it # Make sure your `WANDB_API_KEY` environment variable is set (recommended) or you login to wandb with `wandb login`. wandb_mode: # "offline" to save run metadata locally and not sync to the server, "disabled" to turn off wandb wandb_project: "tiny-llama-sft" wandb_name: wandb_run_id: special_tokens: bos_token: "" eos_token: "" unk_token: "" tokens: # these are delimiters - "<|im_start|>" - "<|im_end|>" ```
## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 476 - num_epochs: 1 ### Framework versions - PEFT 0.8.2 - Transformers 4.38.0.dev0 - Pytorch 2.0.1 - Datasets 2.16.1 - Tokenizers 0.15.0