artificialguybr
/

Meta-Llama-3.1-8B-openhermes-2.5

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Meta-Llama-3.1-8B-openhermes-2.5 / README.md

artificialguybr's picture

artificialguybr

Update README.md

53e5874 verified 3 months ago

|

2.63 kB

	---
	tags:
	- llama
	- instruct
	- finetune
	- chatml
	- gpt4
	- synthetic data
	- distillation
	model-index:
	- name: Meta-Llama-3.1-8B-openhermes-2.5
	results: []
	license: apache-2.0
	language:
	- en
	library_name: transformers
	datasets:
	- teknium/OpenHermes-2.5
	---

	# Model Card for Meta-Llama-3.1-8B-openhermes-2.5

	This model is a fine-tuned version of Meta-Llama-3.1-8B on the OpenHermes-2.5 dataset.

	## Model Details

	### Model Description

	This is a fine-tuned version of the Meta-Llama-3.1-8B model, trained on the OpenHermes-2.5 dataset. It is designed for instruction following and general language tasks.

	- Developed by: artificialguybr
	- Model type: Causal Language Model
	- Language(s): English
	- License: apache-2.0
	- Finetuned from model: meta-llama/Meta-Llama-3.1-8B

	### Model Sources

	- Repository: https://huggingface.co/artificialguybr/Meta-Llama-3.1-8B-openhermes-2.5

	## Uses

	This model can be used for various natural language processing tasks, particularly those involving instruction following and general language understanding.

	### Direct Use

	The model can be used for tasks such as text generation, question answering, and other language-related applications.

	### Out-of-Scope Use

	The model should not be used for generating harmful or biased content. Users should be aware of potential biases in the training data.

	## Training Details

	### Training Data

	The model was fine-tuned on the teknium/OpenHermes-2.5 dataset.

	### Training Procedure

	#### Training Hyperparameters

	- Training regime: BF16 mixed precision
	- Optimizer: AdamW
	- Learning rate: Started at 0.00000249316296439037 (decaying)
	- Batch size: Not specified (gradient accumulation steps: 8)
	- Training steps: 13,368
	- Evaluation strategy: Steps (every 0.16666666666666666 steps)
	- Gradient checkpointing: Enabled
	- Weight decay: 0

	#### Hardware and Software

	- Hardware: NVIDIA A100-SXM4-80GB (1 GPU)
	- Software Framework: 🤗 Transformers, Axolotl

	## Evaluation

	### Metrics

	- Loss: 0.6727465987205505 (evaluation)
	- Perplexity: Not provided

	### Results

	- Evaluation runtime: 2,676.4173 seconds
	- Samples per second: 18.711
	- Steps per second: 18.711

	## Model Architecture

	- Model Type: LlamaForCausalLM
	- Hidden size: 4,096
	- Intermediate size: 14,336
	- Number of attention heads: Not specified
	- Number of layers: Not specified
	- Activation function: SiLU
	- Vocabulary size: 128,256

	## Limitations and Biases

	More information is needed about specific limitations and biases of this model.