alvp's picture
Update README.md
a04fbcb verified
|
raw
history blame
1.55 kB
metadata
license: apache-2.0
datasets:
  - bertin-project/bonanza-hf
  - bertin-project/zenobia-instruct-hf
language:
  - es
  - ca
pipeline_tag: text-generation

Gromenauer-7B-Instruct

gromenauer-7B logo

Overview

Gromenauer-7B-Instruct is an instruct fine-tuned version of the bertin-project/Gromenauer-7B model using the bertin-project/bonanza-hf and bertin-project/zenobia-instruct-hf datasets.

Model Details

  • Model Type: Mistral
  • Sequence Length: 8192
  • Hidden Dimension: 4096
  • Intermediate Dimension: 14336
  • Number of Layers: 32
  • Number of Attention Heads: 32
  • Number of Key-Value Heads: 8
  • Activation Function: SiLU
  • Initializer Range: 0.02
  • Layer Norm Epsilon: 1.0e-05
  • Use Flash Attention: Yes
  • Gradient Checkpointing: Enabled (Block Size: 5)
  • Sliding Window Attention: 4096
  • Use Bias: No

Training Details

  • Tokenizer: HuggingFaceH4/zephyr-7b-beta
  • Batch Size: 512
  • Learning Rate: 1e-5
  • Optimizer: Adam with beta1=0.9, beta2=0.95, epsilon=1e-8
  • Weight Decay: 0.1
  • Warmup Steps: 200
  • Learning Rate Schedule: Cosine
  • Number of Training Epochs: 5