versae alvp commited on
Commit
4cfa58e
1 Parent(s): 6105a15

Create README.md (#1)

Browse files

- Create README.md (eb0849e4c2205deaf7d381ecee8cee956f68d8b5)


Co-authored-by: Álvaro Pérez Pozo <alvp@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - fistro/gromenauer
5
+ language:
6
+ - es
7
+ pipeline_tag: text-generation
8
+ ---
9
+ # Bertin-Gromenauer
10
+
11
+ <div align=center>
12
+ <img alt="BERTIN-gromenauer logo" src="https://huggingface.co/bertin-project/bertin-gromenauer/resolve/main/images/gromenauer.png" width="200px">
13
+ </div>
14
+
15
+ ## Overview
16
+
17
+ Bertin-Gromenauer is a Spanish language model designed to understand and generate high-quality Spanish text. Developed using the robust Mistral architecture, this model has been trained on an extensive literary corpus, ensuring it captures a wide range of linguistic nuances, styles, and contexts found in Spanish literature.
18
+ ## Model Details
19
+
20
+ - **Model Type**: Mistral
21
+ - **Sequence Length**: 8192
22
+ - **Hidden Dimension**: 4096
23
+ - **Intermediate Dimension**: 14336
24
+ - **Number of Layers**: 32
25
+ - **Number of Attention Heads**: 32
26
+ - **Number of Key-Value Heads**: 8
27
+ - **Activation Function**: SiLU
28
+ - **Initializer Range**: 0.02
29
+ - **Layer Norm Epsilon**: 1.0e-05
30
+ - **Use Flash Attention**: Yes
31
+ - **Gradient Checkpointing**: Enabled (Block Size: 5)
32
+ - **Sliding Window Attention**: 4096
33
+ - **Use Bias**: No
34
+
35
+ ## Training Details
36
+
37
+ - **Tokenizer**: [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
38
+ - **Batch Size**: 512
39
+ - **Learning Rate**: 1e-5
40
+ - **Optimizer**: Adam with beta1=0.9, beta2=0.95, epsilon=1e-8
41
+ - **Weight Decay**: 0.1
42
+ - **Warmup Steps**: 200
43
+ - **Learning Rate Schedule**: Cosine
44
+ - **Number of Training Steps**: 7000
45
+
46
+ ## Usage
47
+
48
+ To load the model in your project, you can use the following code:
49
+
50
+ ```python
51
+ from transformers import AutoModel, AutoTokenizer
52
+
53
+ # Load the tokenizer
54
+ tokenizer = AutoTokenizer.from_pretrained("bertin-project/bertin-gromenauer")
55
+
56
+ # Load the model
57
+ model = AutoModel.from_pretrained("bertin-project/bertin-gromenauer")
58
+
59
+ # Example usage
60
+ text = "Introduce aquí tu texto en español."
61
+ inputs = tokenizer(text, return_tensors="pt")
62
+ outputs = model(**inputs)