File size: 4,641 Bytes
ca07d06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d2345a
63b1645
ca07d06
8c82d95
ca07d06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c82d95
ca07d06
 
 
8c82d95
 
ca07d06
 
8c82d95
 
ca07d06
 
 
 
 
 
8c82d95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca07d06
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
license: other
library_name: peft
tags:
- trl
- sft
- generated_from_trainer
base_model: google/gemma-2b
model-index:
- name: gemma-2b-spanishbillionwords
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gemma-2b-spanishbillionwords

This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on [Spanish Billion Words](https://huggingface.co/datasets/jhonparra18/spanish_billion_words_clean).
This is the base Gemma model fine-tuned to perform better on spanish language.
It achieves the following results on the evaluation set:
- Loss: 4.3306

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1
- training_steps: 60
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 5.1254        | 0.0   | 1    | 5.0205          |
| 4.3187        | 0.0   | 2    | 5.0029          |
| 3.8173        | 0.0   | 3    | 4.9801          |
| 5.3879        | 0.0   | 4    | 4.9582          |
| 5.718         | 0.0   | 5    | 4.9343          |
| 5.8628        | 0.0   | 6    | 4.9104          |
| 4.5401        | 0.0   | 7    | 4.8830          |
| 4.4219        | 0.0   | 8    | 4.8539          |
| 5.5169        | 0.0   | 9    | 4.8234          |
| 4.813         | 0.0   | 10   | 4.7878          |
| 4.2111        | 0.0   | 11   | 4.7576          |
| 4.6504        | 0.0   | 12   | 4.7314          |
| 3.7923        | 0.0   | 13   | 4.7116          |
| 3.7773        | 0.0   | 14   | 4.6890          |
| 4.6773        | 0.0   | 15   | 4.6616          |
| 3.0179        | 0.0   | 16   | 4.6329          |
| 3.8922        | 0.0   | 17   | 4.6099          |
| 4.3289        | 0.0   | 18   | 4.5940          |
| 5.0925        | 0.0   | 19   | 4.5822          |
| 4.6499        | 0.0   | 20   | 4.5711          |
| 3.9758        | 0.0   | 21   | 4.5585          |
| 4.593         | 0.0   | 22   | 4.5454          |
| 5.2496        | 0.0   | 23   | 4.5346          |
| 4.2548        | 0.0   | 24   | 4.5217          |
| 3.5209        | 0.0   | 25   | 4.5059          |
| 4.4781        | 0.0   | 26   | 4.4930          |
| 5.4472        | 0.0   | 27   | 4.4834          |
| 4.1987        | 0.0   | 28   | 4.4756          |
| 5.2324        | 0.0   | 29   | 4.4684          |
| 4.8068        | 0.0   | 30   | 4.4593          |
| 3.5455        | 0.0   | 31   | 4.4521          |
| 3.6516        | 0.0   | 32   | 4.4415          |
| 4.1368        | 0.0   | 33   | 4.4289          |
| 6.4659        | 0.0   | 34   | 4.4289          |
| 3.434         | 0.0   | 35   | 4.4173          |
| 3.9518        | 0.0   | 36   | 4.4085          |
| 3.0758        | 0.0   | 37   | 4.4008          |
| 3.6492        | 0.0   | 38   | 4.3930          |
| 4.0352        | 0.0   | 39   | 4.3857          |
| 5.6527        | 0.0   | 40   | 4.3799          |
| 4.233         | 0.0   | 41   | 4.3747          |
| 5.4082        | 0.0   | 42   | 4.3702          |
| 5.1255        | 0.0   | 43   | 4.3661          |
| 4.4567        | 0.0   | 44   | 4.3622          |
| 4.1874        | 0.0   | 45   | 4.3587          |
| 4.3441        | 0.0   | 46   | 4.3555          |
| 4.1636        | 0.0   | 47   | 4.3524          |
| 4.3146        | 0.0   | 48   | 4.3495          |
| 4.6414        | 0.0   | 49   | 4.3473          |
| 4.3666        | 0.0   | 50   | 4.3451          |
| 3.8627        | 0.0   | 51   | 4.3427          |
| 4.5875        | 0.0   | 52   | 4.3406          |
| 6.0364        | 0.0   | 53   | 4.3387          |
| 4.5669        | 0.0   | 54   | 4.3369          |
| 4.5585        | 0.0   | 55   | 4.3353          |
| 2.7858        | 0.0   | 56   | 4.3340          |
| 4.1845        | 0.0   | 57   | 4.3329          |
| 4.4489        | 0.0   | 58   | 4.3319          |
| 5.3263        | 0.0   | 59   | 4.3311          |
| 5.3856        | 0.0   | 60   | 4.3306          |


### Framework versions

- PEFT 0.8.2
- Transformers 4.38.0
- Pytorch 2.2.1+cu121
- Datasets 2.17.0
- Tokenizers 0.15.2