---
base_model: EleutherAI/pythia-160m-deduped
library_name: transformers
license: apache-2.0
tags:
- axolotl
- relora
- generated_from_trainer
model-index:
- name: pythia-160m-dolphin-extended
  results: []
datasets:
- cognitivecomputations/dolphin
- llamafactory/alpaca_gpt4_en
language:
- en
metrics:
- accuracy
- bleu
- rouge
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.4.1`
```yaml
base_model: EleutherAI/pythia-160m-deduped
load_in_8bit: 
datasets:
  - path: vicgalle/alpaca-gpt4
    type: alpaca
  - path: llamafactory/alpaca_gpt4_en
    type: alpaca
  - path: cognitivecomputations/dolphin
    name: flan1m-alpaca-uncensored
    type: alpaca
    shards: 10

dataset_prepared_path: ds-mega-alpaca
#dataset_shard_num: 10
chat_template: inst
val_set_size: 0.001
adapter: lora
lora_model_dir: 
sequence_len: 2048
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - query_key_value
lora_target_linear: 
lora_fan_in_fan_out: true  # pythia/GPTNeoX lora specific
lora_modules_to_save:
  - embed_in
  - embed_out
  - lm_head
lora_on_cpu: false
# ReLoRA configuration
# # Must use either 'lora' or 'qlora' adapter, and does not support fsdp or deepspeed
# relora_steps: # Number of steps per ReLoRA restart
# relora_warmup_steps: # Number of per-restart warmup steps
# relora_anneal_steps: # Number of anneal steps for each relora cycle
# relora_prune_ratio: # threshold for optimizer magnitude when pruning
# relora_cpu_offload:  # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
relora_steps: 600
relora_warmup_steps: 10
relora_cpu_offload: true 
wandb_project: pythia
wandb_entity:
wandb_watch:
wandb_name: pythia-160m-dolphin-extended
wandb_log_model:
output_dir: ./outputs/lora-alpaca-pythia-160m-dolphin-extended
gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
learning_rate: 0.0004
lr_scheduler: cosine_with_restarts
#cosine_min_lr_ratio: 0.1
train_on_inputs: false
group_by_length: false
#bf16: auto
#fp16: true
#tf32: false
float16: true
flash_attn: 
xformers_attention: true
optimizer: paged_adamw_8bit
gpu_memory_limit: 8GiB
hub_model_id: jtatman/pythia-160m-dolphin-extended
early_stopping_patience: 10
#resume_from_checkpoint:  outputs/lora-alpaca-pythia-160m-dolphin-extended/checkpoint-11400
auto_resume_from_checkpoints: true
local_rank:
weight_decay: 0.0
#evals_per_epoch: 4
eval_steps: 200
logging_steps: 1
save_steps: 200
save_total_limit: 5
warmup_steps: 100
tokens:
  - "[INST]"
  - "[/INST]"

```

</details><br>

# pythia-160m-dolphin-extended

This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 6.6729

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0004
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_steps: 100
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step  | Validation Loss |
|:-------------:|:------:|:-----:|:---------------:|
| 25.9906       | 0.0001 | 1     | 29.5342         |
| 21.1303       | 0.0167 | 200   | 20.2350         |
| 16.5026       | 0.0334 | 400   | 18.4930         |
| 17.2725       | 0.0500 | 600   | 16.3395         |
| 11.9697       | 0.0667 | 800   | 12.1401         |
| 11.3783       | 0.0834 | 1000  | 11.8383         |
| 12.8084       | 0.1001 | 1200  | 12.9667         |
| 9.4119        | 0.1167 | 1400  | 9.8787          |
| 10.3527       | 0.1334 | 1600  | 10.0560         |
| 9.3545        | 0.1501 | 1800  | 9.7355          |
| 8.9165        | 0.1668 | 2000  | 9.1513          |
| 8.5467        | 0.1835 | 2200  | 8.2025          |
| 7.9152        | 0.2001 | 2400  | 7.6616          |
| 7.3362        | 0.2168 | 2600  | 7.5699          |
| 7.9374        | 0.2335 | 2800  | 7.4818          |
| 7.838         | 0.2502 | 3000  | 7.4635          |
| 7.5731        | 0.2668 | 3200  | 7.4899          |
| 7.8289        | 0.2835 | 3400  | 7.3594          |
| 7.8906        | 0.3002 | 3600  | 8.0934          |
| 7.7318        | 0.3169 | 3800  | 7.5812          |
| 7.2089        | 0.3335 | 4000  | 7.4839          |
| 7.202         | 0.3502 | 4200  | 7.4486          |
| 6.9493        | 0.3669 | 4400  | 7.3208          |
| 7.1492        | 0.3836 | 4600  | 7.2469          |
| 7.3443        | 0.4003 | 4800  | 7.1378          |
| 7.7056        | 0.4169 | 5000  | 7.1385          |
| 55.0553       | 0.4336 | 5200  | 50.0135         |
| 7.1868        | 0.4503 | 5400  | 6.9898          |
| 6.5803        | 0.4670 | 5600  | 6.9559          |
| 8.6171        | 0.4836 | 5800  | 7.9075          |
| 7.1373        | 0.5003 | 6000  | 6.9280          |
| 6.7077        | 0.5170 | 6200  | 6.8797          |
| 7.0026        | 0.5337 | 6400  | 6.8635          |
| 6.6797        | 0.5504 | 6600  | 6.8178          |
| 6.8067        | 0.5670 | 6800  | 6.7893          |
| 6.5979        | 0.5837 | 7000  | 6.8106          |
| 6.7283        | 0.6004 | 7200  | 6.7998          |
| 7.0015        | 0.6171 | 7400  | 6.7705          |
| 6.1182        | 0.6337 | 7600  | 6.7592          |
| 6.7919        | 0.6504 | 7800  | 6.7446          |
| 6.4523        | 0.6671 | 8000  | 6.7260          |
| 6.765         | 0.6838 | 8200  | 6.7135          |
| 6.4625        | 0.7004 | 8400  | 6.7099          |
| 6.79          | 0.7171 | 8600  | 6.7070          |
| 6.6101        | 0.7338 | 8800  | 6.7017          |
| 6.7541        | 0.7505 | 9000  | 6.6964          |
| 6.7777        | 0.7672 | 9200  | 6.6901          |
| 7.2082        | 0.7838 | 9400  | 6.6869          |
| 6.4263        | 0.8005 | 9600  | 6.6875          |
| 6.1944        | 0.8172 | 9800  | 6.6803          |
| 6.7745        | 0.8339 | 10000 | 6.6865          |
| 6.6746        | 0.8505 | 10200 | 6.6756          |
| 6.6319        | 0.8672 | 10400 | 6.6941          |
| 6.6657        | 0.8839 | 10600 | 6.6764          |
| 6.8516        | 0.9006 | 10800 | 6.6776          |
| 6.6391        | 0.9173 | 11000 | 6.6749          |
| 6.5763        | 0.9339 | 11200 | 6.6729          |
| 6.585         | 0.9506 | 11400 | 6.6694          |
| 6.2999        | 0.9673 | 11600 | 6.6722          |
| 6.8343        | 0.9840 | 11800 | 6.6729          |


### Framework versions

- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1

### Evaluation Results
|       Groups       |Version|     Filter     |n-shot|  Metric   | Value |   |Stderr|
|--------------------|-------|----------------|-----:|-----------|------:|---|-----:|
|Open LLM Leaderboard|N/A    |none            |     5|rouge2_max |16.4873|±  |1.0172|
| - winogrande       |      1|none            |     5|acc        | 0.5120|±  |0.0224|
| - gsm8k            |      3|strict-match    |     5|exact_match| 0.0060|±  |0.0035|
| - hellaswag        |      1|none            |    10|acc        | 0.3520|±  |0.0214|
| - mmlu             |N/A    |none            |     0|acc        | 0.2533|±  |0.0039|
|                    |       |none            |     5|rouge2_acc | 0.1920|±  |0.0176|
|                    |       |none            |     5|rougeL_acc | 0.3860|±  |0.0218|
|                    |       |flexible-extract|     5|exact_match| 0.0220|±  |0.0066|
|                    |       |strict-match    |     5|exact_match| 0.0060|±  |0.0035|
|                    |       |none            |     5|rougeL_diff|-0.7765|±  |1.0034|
|                    |       |none            |     5|rouge1_acc | 0.3700|±  |0.0216|
|                    |       |none            |     5|rouge1_diff|-1.5564|±  |1.0223|
|                    |       |none            |     5|acc_norm   | 0.3180|±  |0.0145|
|                    |       |none            |     5|bleu_diff  |-0.6500|±  |0.6421|
|                    |       |none            |     5|rouge1_max |36.3550|±  |0.9462|
|                    |       |none            |     5|acc        | 0.2664|±  |0.0036|
|                    |       |none            |     5|rougeL_max |33.8798|±  |0.9367|
|                    |       |none            |     5|bleu_max   |15.2292|±  |0.6714|
|                    |       |none            |     5|bleu_acc   | 0.4360|±  |0.0222|
|                    |       |none            |     5|rouge2_diff|-3.3178|±  |0.9477|
| - mmlu             |N/A    |none            |     0|acc        | 0.2533|±  |0.0039|
|  - humanities      |N/A    |none            |     5|acc        | 0.2408|±  |0.0075|
|  - other           |N/A    |none            |     5|acc        | 0.2443|±  |0.0080|
|  - social_sciences |N/A    |none            |     5|acc        | 0.2538|±  |0.0081|
|  - stem            |N/A    |none            |     5|acc        | 0.2740|±  |0.0079|
| - truthfulqa       |N/A    |none            |     0|rouge2_max |16.4873|±  |1.0172|
|                    |       |none            |     0|rouge2_acc | 0.1920|±  |0.0176|
|                    |       |none            |     0|rougeL_acc | 0.3860|±  |0.0218|
|                    |       |none            |     0|rougeL_diff|-0.7765|±  |1.0034|
|                    |       |none            |     0|rouge1_acc | 0.3700|±  |0.0216|
|                    |       |none            |     0|rouge1_diff|-1.5564|±  |1.0223|
|                    |       |none            |     0|bleu_diff  |-0.6500|±  |0.6421|
|                    |       |none            |     0|rouge1_max |36.3550|±  |0.9462|
|                    |       |none            |     0|acc        | 0.3435|±  |0.0137|
|                    |       |none            |     0|rougeL_max |33.8798|±  |0.9367|
|                    |       |none            |     0|bleu_max   |15.2292|±  |0.6714|
|                    |       |none            |     0|bleu_acc   | 0.4360|±  |0.0222|
|                    |       |none            |     0|rouge2_diff|-3.3178|±  |0.9477|