--- base_model: meta-llama/Meta-Llama-3-8B datasets: - generator library_name: peft license: llama3 tags: - trl - sft - generated_from_trainer model-index: - name: POC-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata results: [] --- # POC-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the generator dataset. It achieves the following results on the evaluation set: - Loss: 2.2897 ## Model description Article: https://medium.com/@frankmorales_91352/fine-tuning-meta-llama-3-8b-with-medal-a-refined-approach-for-enhanced-medical-language-b924d226b09d ## Training and evaluation data Fine-Tuning: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Meta_Llama_3_8B_for_MEDAL_EVALDATA.ipynb Evaluation: https://github.com/frank-morales2020/MLxDL/blob/main/Meta_Llama_3_8B_for_MEDAL_EVALUATOR_evaldata.ipynb ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.15 - num_epochs: 0.3 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 2.7645 | 0.0069 | 100 | 2.6720 | | 2.5917 | 0.0138 | 200 | 2.5243 | | 2.5054 | 0.0207 | 300 | 2.4705 | | 2.4406 | 0.0277 | 400 | 2.4379 | | 2.4272 | 0.0346 | 500 | 2.4136 | | 2.4171 | 0.0415 | 600 | 2.3942 | | 2.3908 | 0.0484 | 700 | 2.3793 | | 2.3808 | 0.0553 | 800 | 2.3664 | | 2.3588 | 0.0622 | 900 | 2.3571 | | 2.3595 | 0.0692 | 1000 | 2.3494 | | 2.3411 | 0.0761 | 1100 | 2.3421 | | 2.3308 | 0.0830 | 1200 | 2.3369 | | 2.3358 | 0.0899 | 1300 | 2.3320 | | 2.3295 | 0.0968 | 1400 | 2.3270 | | 2.337 | 0.1037 | 1500 | 2.3228 | | 2.3182 | 0.1106 | 1600 | 2.3195 | | 2.3334 | 0.1176 | 1700 | 2.3161 | | 2.3278 | 0.1245 | 1800 | 2.3128 | | 2.3151 | 0.1314 | 1900 | 2.3101 | | 2.3245 | 0.1383 | 2000 | 2.3075 | | 2.3073 | 0.1452 | 2100 | 2.3053 | | 2.3094 | 0.1521 | 2200 | 2.3036 | | 2.3101 | 0.1590 | 2300 | 2.3013 | | 2.3102 | 0.1660 | 2400 | 2.2995 | | 2.3042 | 0.1729 | 2500 | 2.2980 | | 2.2942 | 0.1798 | 2600 | 2.2965 | | 2.2876 | 0.1867 | 2700 | 2.2951 | | 2.3077 | 0.1936 | 2800 | 2.2941 | | 2.2851 | 0.2005 | 2900 | 2.2931 | | 2.2766 | 0.2075 | 3000 | 2.2923 | | 2.2873 | 0.2144 | 3100 | 2.2916 | | 2.2971 | 0.2213 | 3200 | 2.2910 | | 2.2942 | 0.2282 | 3300 | 2.2906 | | 2.2872 | 0.2351 | 3400 | 2.2903 | | 2.2996 | 0.2420 | 3500 | 2.2901 | | 2.2855 | 0.2489 | 3600 | 2.2899 | | 2.2969 | 0.2559 | 3700 | 2.2898 | | 2.2871 | 0.2628 | 3800 | 2.2898 | | 2.2905 | 0.2697 | 3900 | 2.2897 | | 2.2915 | 0.2766 | 4000 | 2.2897 | | 2.2921 | 0.2835 | 4100 | 2.2897 | | 2.3087 | 0.2904 | 4200 | 2.2897 | | 2.3017 | 0.2974 | 4300 | 2.2897 | ### Framework versions - PEFT 0.11.1 - Transformers 4.41.2 - Pytorch 2.3.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1