bertin-project
/

Gromenauer-7B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Gromenauer-7B-Instruct / README.md

alvp's picture

Update README.md

a04fbcb verified 5 months ago

|

1.55 kB

	---
	license: apache-2.0
	datasets:
	- bertin-project/bonanza-hf
	- bertin-project/zenobia-instruct-hf
	language:
	- es
	- ca
	pipeline_tag: text-generation
	---
	# Gromenauer-7B-Instruct

	<div align=center>
	<img alt="gromenauer-7B logo" src="https://huggingface.co/bertin-project/Gromenauer-7B/resolve/main/images/gromenauer.png" width="200px">
	</div>

	## Overview
	Gromenauer-7B-Instruct is an instruct fine-tuned version of the [bertin-project/Gromenauer-7B](https://huggingface.co/bertin-project/Gromenauer-7B) model using the [bertin-project/bonanza-hf](https://huggingface.co/datasets/bertin-project/bonanza-hf) and [bertin-project/zenobia-instruct-hf](https://huggingface.co/datasets/bertin-project/zenobia-instruct-hf) datasets.

	## Model Details

	- Model Type: Mistral
	- Sequence Length: 8192
	- Hidden Dimension: 4096
	- Intermediate Dimension: 14336
	- Number of Layers: 32
	- Number of Attention Heads: 32
	- Number of Key-Value Heads: 8
	- Activation Function: SiLU
	- Initializer Range: 0.02
	- Layer Norm Epsilon: 1.0e-05
	- Use Flash Attention: Yes
	- Gradient Checkpointing: Enabled (Block Size: 5)
	- Sliding Window Attention: 4096
	- Use Bias: No

	## Training Details

	- Tokenizer: [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
	- Batch Size: 512
	- Learning Rate: 1e-5
	- Optimizer: Adam with beta1=0.9, beta2=0.95, epsilon=1e-8
	- Weight Decay: 0.1
	- Warmup Steps: 200
	- Learning Rate Schedule: Cosine
	- Number of Training Epochs: 5