mcamara/gemma-2b-es-spanishbillionwords

Browse files

Files changed (5) hide show

README.md +66 -16
adapter_config.json +2 -2
adapter_model.safetensors +1 -1
runs/Mar11_14-41-39_byo-WS5/events.out.tfevents.1710164560.byo-WS5.257567.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 4.1108
 ## Model description
@@ -37,32 +37,82 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0001
 - train_batch_size: 1
 - eval_batch_size: 8
 - seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 4
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 2
-- training_steps: 10
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 1.2899        | 1.0   | 1    | 4.1511          |
-| 1.2899        | 2.0   | 2    | 4.1486          |
-| 1.269         | 3.0   | 3    | 4.1424          |
-| 1.2206        | 4.0   | 4    | 4.1363          |
-| 1.1768        | 5.0   | 5    | 4.1303          |
-| 1.1391        | 6.0   | 6    | 4.1232          |
-| 1.1083        | 7.0   | 7    | 4.1190          |
-| 1.0829        | 8.0   | 8    | 4.1162          |
-| 1.0633        | 9.0   | 9    | 4.1131          |
-| 1.05          | 10.0  | 10   | 4.1108          |
 ### Framework versions

 This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 4.3306
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0003
 - train_batch_size: 1
 - eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 2
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 1
+- training_steps: 60
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 5.1254        | 0.0   | 1    | 5.0205          |
+| 4.3187        | 0.0   | 2    | 5.0029          |
+| 3.8173        | 0.0   | 3    | 4.9801          |
+| 5.3879        | 0.0   | 4    | 4.9582          |
+| 5.718         | 0.0   | 5    | 4.9343          |
+| 5.8628        | 0.0   | 6    | 4.9104          |
+| 4.5401        | 0.0   | 7    | 4.8830          |
+| 4.4219        | 0.0   | 8    | 4.8539          |
+| 5.5169        | 0.0   | 9    | 4.8234          |
+| 4.813         | 0.0   | 10   | 4.7878          |
+| 4.2111        | 0.0   | 11   | 4.7576          |
+| 4.6504        | 0.0   | 12   | 4.7314          |
+| 3.7923        | 0.0   | 13   | 4.7116          |
+| 3.7773        | 0.0   | 14   | 4.6890          |
+| 4.6773        | 0.0   | 15   | 4.6616          |
+| 3.0179        | 0.0   | 16   | 4.6329          |
+| 3.8922        | 0.0   | 17   | 4.6099          |
+| 4.3289        | 0.0   | 18   | 4.5940          |
+| 5.0925        | 0.0   | 19   | 4.5822          |
+| 4.6499        | 0.0   | 20   | 4.5711          |
+| 3.9758        | 0.0   | 21   | 4.5585          |
+| 4.593         | 0.0   | 22   | 4.5454          |
+| 5.2496        | 0.0   | 23   | 4.5346          |
+| 4.2548        | 0.0   | 24   | 4.5217          |
+| 3.5209        | 0.0   | 25   | 4.5059          |
+| 4.4781        | 0.0   | 26   | 4.4930          |
+| 5.4472        | 0.0   | 27   | 4.4834          |
+| 4.1987        | 0.0   | 28   | 4.4756          |
+| 5.2324        | 0.0   | 29   | 4.4684          |
+| 4.8068        | 0.0   | 30   | 4.4593          |
+| 3.5455        | 0.0   | 31   | 4.4521          |
+| 3.6516        | 0.0   | 32   | 4.4415          |
+| 4.1368        | 0.0   | 33   | 4.4289          |
+| 6.4659        | 0.0   | 34   | 4.4289          |
+| 3.434         | 0.0   | 35   | 4.4173          |
+| 3.9518        | 0.0   | 36   | 4.4085          |
+| 3.0758        | 0.0   | 37   | 4.4008          |
+| 3.6492        | 0.0   | 38   | 4.3930          |
+| 4.0352        | 0.0   | 39   | 4.3857          |
+| 5.6527        | 0.0   | 40   | 4.3799          |
+| 4.233         | 0.0   | 41   | 4.3747          |
+| 5.4082        | 0.0   | 42   | 4.3702          |
+| 5.1255        | 0.0   | 43   | 4.3661          |
+| 4.4567        | 0.0   | 44   | 4.3622          |
+| 4.1874        | 0.0   | 45   | 4.3587          |
+| 4.3441        | 0.0   | 46   | 4.3555          |
+| 4.1636        | 0.0   | 47   | 4.3524          |
+| 4.3146        | 0.0   | 48   | 4.3495          |
+| 4.6414        | 0.0   | 49   | 4.3473          |
+| 4.3666        | 0.0   | 50   | 4.3451          |
+| 3.8627        | 0.0   | 51   | 4.3427          |
+| 4.5875        | 0.0   | 52   | 4.3406          |
+| 6.0364        | 0.0   | 53   | 4.3387          |
+| 4.5669        | 0.0   | 54   | 4.3369          |
+| 4.5585        | 0.0   | 55   | 4.3353          |
+| 2.7858        | 0.0   | 56   | 4.3340          |
+| 4.1845        | 0.0   | 57   | 4.3329          |
+| 4.4489        | 0.0   | 58   | 4.3319          |
+| 5.3263        | 0.0   | 59   | 4.3311          |
+| 5.3856        | 0.0   | 60   | 4.3306          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,11 +20,11 @@
   "revision": null,
   "target_modules": [
     "o_proj",
     "k_proj",
-    "down_proj",
     "q_proj",
-    "gate_proj",
     "v_proj",
     "up_proj"
   ],
   "task_type": "CAUSAL_LM",

   "revision": null,
   "target_modules": [
     "o_proj",
+    "gate_proj",
     "k_proj",
     "q_proj",
     "v_proj",
+    "down_proj",
     "up_proj"
   ],
   "task_type": "CAUSAL_LM",

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9ab5e319b196c0a9e19d54444d3dbcdb021ed662550ce77e83744a1efff6fae1
 size 39256456

 version https://git-lfs.github.com/spec/v1
+oid sha256:4733cf3be9a1b1cc1eef237896f0e87d180f06a6a87daa8b0cd4680695190f31
 size 39256456

runs/Mar11_14-41-39_byo-WS5/events.out.tfevents.1710164560.byo-WS5.257567.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3b98492ae3c6e53a0d1209f05d5c1a27b8e7d66741a736473837cbcb1f0666a6
+size 33777

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1ffd8063a4009b25c6ac0b77f6bd5247365eaa588138918216b9109515de9911
 size 4920

 version https://git-lfs.github.com/spec/v1
+oid sha256:913b6d228e7de3dfa8097bed24d53204101f2f9000e4bf09c36c0303cedca400
 size 4920