bertin-project
/

Gromenauer-7B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

alvp commited on Jun 24

Commit

a04fbcb

•

1 Parent(s): a43ecc5

Update README.md

Files changed (1) hide show

README.md +46 -3

README.md CHANGED Viewed

@@ -1,3 +1,46 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- bertin-project/bonanza-hf
+- bertin-project/zenobia-instruct-hf
+language:
+- es
+- ca
+pipeline_tag: text-generation
+---
+# Gromenauer-7B-Instruct
+<div align=center>
+<img alt="gromenauer-7B logo" src="https://huggingface.co/bertin-project/Gromenauer-7B/resolve/main/images/gromenauer.png" width="200px">
+</div>
+## Overview
+Gromenauer-7B-Instruct is an instruct fine-tuned version of the [bertin-project/Gromenauer-7B](https://huggingface.co/bertin-project/Gromenauer-7B) model using the [bertin-project/bonanza-hf](https://huggingface.co/datasets/bertin-project/bonanza-hf) and [bertin-project/zenobia-instruct-hf](https://huggingface.co/datasets/bertin-project/zenobia-instruct-hf) datasets.
+## Model Details
+- **Model Type**: Mistral
+- **Sequence Length**: 8192
+- **Hidden Dimension**: 4096
+- **Intermediate Dimension**: 14336
+- **Number of Layers**: 32
+- **Number of Attention Heads**: 32
+- **Number of Key-Value Heads**: 8
+- **Activation Function**: SiLU
+- **Initializer Range**: 0.02
+- **Layer Norm Epsilon**: 1.0e-05
+- **Use Flash Attention**: Yes
+- **Gradient Checkpointing**: Enabled (Block Size: 5)
+- **Sliding Window Attention**: 4096
+- **Use Bias**: No
+## Training Details
+- **Tokenizer**: [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
+- **Batch Size**: 512
+- **Learning Rate**: 1e-5
+- **Optimizer**: Adam with beta1=0.9, beta2=0.95, epsilon=1e-8
+- **Weight Decay**: 0.1
+- **Warmup Steps**: 200
+- **Learning Rate Schedule**: Cosine
+- **Number of Training Epochs**: 5