---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_replace_iter9_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_replace_iter9_sftsd1

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.6634
- Num Input Tokens Seen: 8155608

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.5468        | 0.0315 | 5    | 1.3110          | 253928            |
| 1.182         | 0.0630 | 10   | 1.2480          | 514960            |
| 0.8084        | 0.0945 | 15   | 1.3189          | 773760            |
| 0.6416        | 0.1259 | 20   | 1.4974          | 1041648           |
| 0.3966        | 0.1574 | 25   | 1.6165          | 1307976           |
| 0.2039        | 0.1889 | 30   | 1.8225          | 1565976           |
| 0.1576        | 0.2204 | 35   | 1.9499          | 1822872           |
| 0.0829        | 0.2519 | 40   | 2.1969          | 2080200           |
| 0.0476        | 0.2834 | 45   | 2.3565          | 2335552           |
| 0.0338        | 0.3148 | 50   | 2.4119          | 2590880           |
| 0.0303        | 0.3463 | 55   | 2.5071          | 2851232           |
| 0.0381        | 0.3778 | 60   | 2.5463          | 3110576           |
| 0.0307        | 0.4093 | 65   | 2.5668          | 3369800           |
| 0.0279        | 0.4408 | 70   | 2.5711          | 3630600           |
| 0.0262        | 0.4723 | 75   | 2.6104          | 3884416           |
| 0.0284        | 0.5037 | 80   | 2.6201          | 4140232           |
| 0.0265        | 0.5352 | 85   | 2.6255          | 4390344           |
| 0.0265        | 0.5667 | 90   | 2.6473          | 4646944           |
| 0.0288        | 0.5982 | 95   | 2.6452          | 4907960           |
| 0.0242        | 0.6297 | 100  | 2.6281          | 5157432           |
| 0.0235        | 0.6612 | 105  | 2.6248          | 5417680           |
| 0.0256        | 0.6926 | 110  | 2.6399          | 5680504           |
| 0.0224        | 0.7241 | 115  | 2.6534          | 5934288           |
| 0.0246        | 0.7556 | 120  | 2.6607          | 6188664           |
| 0.0313        | 0.7871 | 125  | 2.6628          | 6444560           |
| 0.0252        | 0.8186 | 130  | 2.6540          | 6702464           |
| 0.0258        | 0.8501 | 135  | 2.6528          | 6962424           |
| 0.0276        | 0.8815 | 140  | 2.6468          | 7217352           |
| 0.0245        | 0.9130 | 145  | 2.6580          | 7472288           |
| 0.025         | 0.9445 | 150  | 2.6685          | 7739408           |
| 0.0285        | 0.9760 | 155  | 2.6733          | 8001312           |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1