File size: 2,626 Bytes

75ee2ae
53e5874
 
 
 
 
 
 
 
 
 
 
 
 
 
75ee2ae
53e5874
 
75ee2ae
 
53e5874
75ee2ae
53e5874
75ee2ae
 
 
 
 
53e5874
75ee2ae
53e5874
 
 
 
 
75ee2ae
53e5874
75ee2ae
53e5874
75ee2ae
 
 
53e5874
75ee2ae
 
 
53e5874
75ee2ae
 
 
53e5874
75ee2ae
 
 
 
 
53e5874
75ee2ae
 
 
 
 
53e5874
 
 
 
 
 
 
 
75ee2ae
53e5874
75ee2ae
53e5874
 
75ee2ae
 
 
53e5874
75ee2ae
53e5874
 
75ee2ae
 
 
53e5874
 
 
75ee2ae
53e5874
75ee2ae
53e5874
 
 
 
 
 
 
75ee2ae
53e5874
75ee2ae
53e5874

---
tags:
  - llama
  - instruct
  - finetune
  - chatml
  - gpt4
  - synthetic data
  - distillation
model-index:
  - name: Meta-Llama-3.1-8B-openhermes-2.5
    results: []
license: apache-2.0
language:
  - en
library_name: transformers
datasets:
- teknium/OpenHermes-2.5
---

# Model Card for Meta-Llama-3.1-8B-openhermes-2.5

This model is a fine-tuned version of Meta-Llama-3.1-8B on the OpenHermes-2.5 dataset.

## Model Details

### Model Description

This is a fine-tuned version of the Meta-Llama-3.1-8B model, trained on the OpenHermes-2.5 dataset. It is designed for instruction following and general language tasks.

- **Developed by:** artificialguybr
- **Model type:** Causal Language Model
- **Language(s):** English
- **License:** apache-2.0
- **Finetuned from model:** meta-llama/Meta-Llama-3.1-8B

### Model Sources

- **Repository:** https://huggingface.co/artificialguybr/Meta-Llama-3.1-8B-openhermes-2.5

## Uses

This model can be used for various natural language processing tasks, particularly those involving instruction following and general language understanding.

### Direct Use

The model can be used for tasks such as text generation, question answering, and other language-related applications.

### Out-of-Scope Use

The model should not be used for generating harmful or biased content. Users should be aware of potential biases in the training data.

## Training Details

### Training Data

The model was fine-tuned on the teknium/OpenHermes-2.5 dataset.

### Training Procedure

#### Training Hyperparameters

- **Training regime:** BF16 mixed precision
- **Optimizer:** AdamW
- **Learning rate:** Started at 0.00000249316296439037 (decaying)
- **Batch size:** Not specified (gradient accumulation steps: 8)
- **Training steps:** 13,368
- **Evaluation strategy:** Steps (every 0.16666666666666666 steps)
- **Gradient checkpointing:** Enabled
- **Weight decay:** 0

#### Hardware and Software

- **Hardware:** NVIDIA A100-SXM4-80GB (1 GPU)
- **Software Framework:** 🤗 Transformers, Axolotl

## Evaluation

### Metrics

- **Loss:** 0.6727465987205505 (evaluation)
- **Perplexity:** Not provided

### Results

- **Evaluation runtime:** 2,676.4173 seconds
- **Samples per second:** 18.711
- **Steps per second:** 18.711

## Model Architecture

- **Model Type:** LlamaForCausalLM
- **Hidden size:** 4,096
- **Intermediate size:** 14,336
- **Number of attention heads:** Not specified
- **Number of layers:** Not specified
- **Activation function:** SiLU
- **Vocabulary size:** 128,256

## Limitations and Biases

More information is needed about specific limitations and biases of this model.