--- tags: - llama - instruct - finetune - chatml - gpt4 - synthetic data - distillation model-index: - name: Meta-Llama-3.1-8B-openhermes-2.5 results: [] license: apache-2.0 language: - en library_name: transformers datasets: - teknium/OpenHermes-2.5 --- # Model Card for Meta-Llama-3.1-8B-openhermes-2.5 This model is a fine-tuned version of Meta-Llama-3.1-8B on the OpenHermes-2.5 dataset. ## Model Details ### Model Description This is a fine-tuned version of the Meta-Llama-3.1-8B model, trained on the OpenHermes-2.5 dataset. It is designed for instruction following and general language tasks. - **Developed by:** artificialguybr - **Model type:** Causal Language Model - **Language(s):** English - **License:** apache-2.0 - **Finetuned from model:** meta-llama/Meta-Llama-3.1-8B ### Model Sources - **Repository:** https://huggingface.co/artificialguybr/Meta-Llama-3.1-8B-openhermes-2.5 ## Uses This model can be used for various natural language processing tasks, particularly those involving instruction following and general language understanding. ### Direct Use The model can be used for tasks such as text generation, question answering, and other language-related applications. ### Out-of-Scope Use The model should not be used for generating harmful or biased content. Users should be aware of potential biases in the training data. ## Training Details ### Training Data The model was fine-tuned on the teknium/OpenHermes-2.5 dataset. ### Training Procedure #### Training Hyperparameters - **Training regime:** BF16 mixed precision - **Optimizer:** AdamW - **Learning rate:** Started at 0.00000249316296439037 (decaying) - **Batch size:** Not specified (gradient accumulation steps: 8) - **Training steps:** 13,368 - **Evaluation strategy:** Steps (every 0.16666666666666666 steps) - **Gradient checkpointing:** Enabled - **Weight decay:** 0 #### Hardware and Software - **Hardware:** NVIDIA A100-SXM4-80GB (1 GPU) - **Software Framework:** 🤗 Transformers, Axolotl ## Evaluation ### Metrics - **Loss:** 0.6727465987205505 (evaluation) - **Perplexity:** Not provided ### Results - **Evaluation runtime:** 2,676.4173 seconds - **Samples per second:** 18.711 - **Steps per second:** 18.711 ## Model Architecture - **Model Type:** LlamaForCausalLM - **Hidden size:** 4,096 - **Intermediate size:** 14,336 - **Number of attention heads:** Not specified - **Number of layers:** Not specified - **Activation function:** SiLU - **Vocabulary size:** 128,256 ## Limitations and Biases More information is needed about specific limitations and biases of this model.