--- license: apache-2.0 datasets: - bertin-project/bonanza-hf - bertin-project/zenobia-instruct-hf language: - es - ca pipeline_tag: text-generation --- # Gromenauer-7B-Instruct
gromenauer-7B logo
## Overview Gromenauer-7B-Instruct is an instruct fine-tuned version of the [bertin-project/Gromenauer-7B](https://huggingface.co/bertin-project/Gromenauer-7B) model using the [bertin-project/bonanza-hf](https://huggingface.co/datasets/bertin-project/bonanza-hf) and [bertin-project/zenobia-instruct-hf](https://huggingface.co/datasets/bertin-project/zenobia-instruct-hf) datasets. ## Model Details - **Model Type**: Mistral - **Sequence Length**: 8192 - **Hidden Dimension**: 4096 - **Intermediate Dimension**: 14336 - **Number of Layers**: 32 - **Number of Attention Heads**: 32 - **Number of Key-Value Heads**: 8 - **Activation Function**: SiLU - **Initializer Range**: 0.02 - **Layer Norm Epsilon**: 1.0e-05 - **Use Flash Attention**: Yes - **Gradient Checkpointing**: Enabled (Block Size: 5) - **Sliding Window Attention**: 4096 - **Use Bias**: No ## Training Details - **Tokenizer**: [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) - **Batch Size**: 512 - **Learning Rate**: 1e-5 - **Optimizer**: Adam with beta1=0.9, beta2=0.95, epsilon=1e-8 - **Weight Decay**: 0.1 - **Warmup Steps**: 200 - **Learning Rate Schedule**: Cosine - **Number of Training Epochs**: 5