bertin-project
/

Gromenauer-7B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Gromenauer-7B-Instruct / README.md

alvp's picture

Update README.md

a04fbcb verified 5 months ago

|

1.55 kB

metadata

license: apache-2.0
datasets:
  - bertin-project/bonanza-hf
  - bertin-project/zenobia-instruct-hf
language:
  - es
  - ca
pipeline_tag: text-generation

Gromenauer-7B-Instruct

Overview

Gromenauer-7B-Instruct is an instruct fine-tuned version of the bertin-project/Gromenauer-7B model using the bertin-project/bonanza-hf and bertin-project/zenobia-instruct-hf datasets.

Model Details

Model Type: Mistral
Sequence Length: 8192
Hidden Dimension: 4096
Intermediate Dimension: 14336
Number of Layers: 32
Number of Attention Heads: 32
Number of Key-Value Heads: 8
Activation Function: SiLU
Initializer Range: 0.02
Layer Norm Epsilon: 1.0e-05
Use Flash Attention: Yes
Gradient Checkpointing: Enabled (Block Size: 5)
Sliding Window Attention: 4096
Use Bias: No

Training Details

Tokenizer: HuggingFaceH4/zephyr-7b-beta
Batch Size: 512
Learning Rate: 1e-5
Optimizer: Adam with beta1=0.9, beta2=0.95, epsilon=1e-8
Weight Decay: 0.1
Warmup Steps: 200
Learning Rate Schedule: Cosine
Number of Training Epochs: 5