metadata
license: apache-2.0
base_model: EleutherAI/pythia-160m
tags:
- generated_from_trainer
model-index:
- name: pythia_160m_alpaca_farm_instructions_sft_constant_pa_seed_1
results: []
pythia_160m_alpaca_farm_instructions_sft_constant_pa_seed_1
This model is a fine-tuned version of EleutherAI/pythia-160m on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.1686
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 1
- distributed_type: multi-GPU
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 3.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.3096 | 0.02 | 50 | 2.2544 |
2.2692 | 0.04 | 100 | 2.2374 |
2.2021 | 0.06 | 150 | 2.2228 |
2.2268 | 0.08 | 200 | 2.2338 |
2.1433 | 0.1 | 250 | 2.2146 |
2.0708 | 0.12 | 300 | 2.2004 |
2.163 | 0.14 | 350 | 2.1996 |
2.2518 | 0.16 | 400 | 2.1898 |
2.0717 | 0.18 | 450 | 2.1899 |
2.2137 | 0.2 | 500 | 2.1847 |
2.2232 | 0.22 | 550 | 2.1760 |
2.2455 | 0.24 | 600 | 2.1757 |
2.1936 | 0.26 | 650 | 2.1732 |
2.1352 | 0.28 | 700 | 2.1619 |
2.1215 | 0.3 | 750 | 2.1608 |
2.1568 | 0.32 | 800 | 2.1506 |
2.1319 | 0.34 | 850 | 2.1514 |
2.0831 | 0.36 | 900 | 2.1494 |
2.0788 | 0.38 | 950 | 2.1430 |
2.0901 | 0.4 | 1000 | 2.1376 |
2.1374 | 0.42 | 1050 | 2.1343 |
1.9484 | 0.44 | 1100 | 2.1298 |
2.204 | 0.46 | 1150 | 2.1284 |
2.108 | 0.48 | 1200 | 2.1249 |
1.9353 | 0.5 | 1250 | 2.1210 |
2.1352 | 0.52 | 1300 | 2.1178 |
1.9498 | 0.54 | 1350 | 2.1162 |
2.1571 | 0.56 | 1400 | 2.1153 |
2.1804 | 0.58 | 1450 | 2.1114 |
1.988 | 0.6 | 1500 | 2.1107 |
2.0485 | 0.62 | 1550 | 2.1055 |
2.0596 | 0.64 | 1600 | 2.1020 |
1.98 | 0.66 | 1650 | 2.1027 |
2.0626 | 0.68 | 1700 | 2.0980 |
2.097 | 0.7 | 1750 | 2.0949 |
2.2013 | 0.72 | 1800 | 2.0893 |
2.1234 | 0.74 | 1850 | 2.0913 |
1.9662 | 0.76 | 1900 | 2.0971 |
2.138 | 0.78 | 1950 | 2.0929 |
2.0816 | 0.8 | 2000 | 2.0898 |
2.1506 | 0.82 | 2050 | 2.0848 |
2.0585 | 0.84 | 2100 | 2.0860 |
2.099 | 0.86 | 2150 | 2.0862 |
2.084 | 0.88 | 2200 | 2.0816 |
2.1046 | 0.9 | 2250 | 2.0790 |
2.02 | 0.92 | 2300 | 2.0865 |
2.0548 | 0.94 | 2350 | 2.0776 |
2.0819 | 0.96 | 2400 | 2.0766 |
1.9181 | 0.98 | 2450 | 2.0755 |
2.0345 | 1.0 | 2500 | 2.0793 |
1.7741 | 1.02 | 2550 | 2.0922 |
1.6556 | 1.04 | 2600 | 2.0921 |
1.6168 | 1.06 | 2650 | 2.0921 |
1.8017 | 1.08 | 2700 | 2.0927 |
1.8055 | 1.1 | 2750 | 2.0893 |
1.7298 | 1.12 | 2800 | 2.0910 |
1.6924 | 1.14 | 2850 | 2.0969 |
1.853 | 1.16 | 2900 | 2.0951 |
1.7641 | 1.18 | 2950 | 2.1020 |
1.7529 | 1.2 | 3000 | 2.0991 |
1.7556 | 1.22 | 3050 | 2.1005 |
1.7273 | 1.24 | 3100 | 2.0984 |
1.8478 | 1.26 | 3150 | 2.1000 |
1.8965 | 1.28 | 3200 | 2.0932 |
1.761 | 1.3 | 3250 | 2.0917 |
1.7579 | 1.32 | 3300 | 2.0943 |
1.7347 | 1.34 | 3350 | 2.0914 |
1.7725 | 1.36 | 3400 | 2.0928 |
1.8931 | 1.38 | 3450 | 2.0913 |
1.7301 | 1.4 | 3500 | 2.1030 |
1.741 | 1.42 | 3550 | 2.0953 |
1.8009 | 1.44 | 3600 | 2.0971 |
1.8397 | 1.46 | 3650 | 2.0932 |
1.7941 | 1.48 | 3700 | 2.0932 |
1.7136 | 1.5 | 3750 | 2.0936 |
1.723 | 1.52 | 3800 | 2.0913 |
1.7837 | 1.54 | 3850 | 2.0878 |
1.7988 | 1.56 | 3900 | 2.0859 |
1.7759 | 1.58 | 3950 | 2.0883 |
1.8608 | 1.6 | 4000 | 2.0926 |
1.5859 | 1.62 | 4050 | 2.0918 |
1.8474 | 1.64 | 4100 | 2.0888 |
1.7921 | 1.66 | 4150 | 2.0932 |
1.755 | 1.68 | 4200 | 2.0950 |
1.8437 | 1.7 | 4250 | 2.0880 |
1.826 | 1.72 | 4300 | 2.0861 |
1.8548 | 1.74 | 4350 | 2.0886 |
1.7668 | 1.76 | 4400 | 2.0832 |
1.7818 | 1.78 | 4450 | 2.0877 |
1.8981 | 1.8 | 4500 | 2.0900 |
1.9266 | 1.82 | 4550 | 2.0855 |
1.8589 | 1.84 | 4600 | 2.0795 |
1.7587 | 1.86 | 4650 | 2.0833 |
1.6735 | 1.88 | 4700 | 2.0886 |
1.7961 | 1.9 | 4750 | 2.0874 |
1.8099 | 1.92 | 4800 | 2.0801 |
1.8481 | 1.94 | 4850 | 2.0802 |
1.8418 | 1.96 | 4900 | 2.0774 |
1.8471 | 1.98 | 4950 | 2.0876 |
1.829 | 2.0 | 5000 | 2.0820 |
1.4073 | 2.02 | 5050 | 2.1485 |
1.4951 | 2.04 | 5100 | 2.1651 |
1.4291 | 2.06 | 5150 | 2.1522 |
1.3912 | 2.08 | 5200 | 2.1545 |
1.5581 | 2.1 | 5250 | 2.1462 |
1.5533 | 2.12 | 5300 | 2.1613 |
1.5436 | 2.14 | 5350 | 2.1562 |
1.4632 | 2.16 | 5400 | 2.1437 |
1.5859 | 2.18 | 5450 | 2.1563 |
1.4974 | 2.2 | 5500 | 2.1749 |
1.464 | 2.22 | 5550 | 2.1648 |
1.4689 | 2.24 | 5600 | 2.1623 |
1.565 | 2.26 | 5650 | 2.1656 |
1.5491 | 2.28 | 5700 | 2.1696 |
1.5382 | 2.3 | 5750 | 2.1659 |
1.4154 | 2.32 | 5800 | 2.1614 |
1.4636 | 2.34 | 5850 | 2.1570 |
1.4858 | 2.36 | 5900 | 2.1634 |
1.4295 | 2.38 | 5950 | 2.1897 |
1.6108 | 2.4 | 6000 | 2.1653 |
1.4283 | 2.42 | 6050 | 2.1633 |
1.4685 | 2.44 | 6100 | 2.1720 |
1.4443 | 2.46 | 6150 | 2.1618 |
1.4918 | 2.48 | 6200 | 2.1577 |
1.5742 | 2.5 | 6250 | 2.1665 |
1.49 | 2.52 | 6300 | 2.1697 |
1.552 | 2.54 | 6350 | 2.1489 |
1.5577 | 2.56 | 6400 | 2.1660 |
1.4348 | 2.58 | 6450 | 2.1766 |
1.5508 | 2.6 | 6500 | 2.1564 |
1.4666 | 2.62 | 6550 | 2.1644 |
1.4784 | 2.64 | 6600 | 2.1611 |
1.6065 | 2.66 | 6650 | 2.1770 |
1.559 | 2.68 | 6700 | 2.1635 |
1.5579 | 2.7 | 6750 | 2.1605 |
1.5103 | 2.72 | 6800 | 2.1735 |
1.5369 | 2.74 | 6850 | 2.1711 |
1.6012 | 2.76 | 6900 | 2.1650 |
1.5058 | 2.78 | 6950 | 2.1683 |
1.6553 | 2.8 | 7000 | 2.1613 |
1.5858 | 2.82 | 7050 | 2.1664 |
1.6428 | 2.84 | 7100 | 2.1566 |
1.4619 | 2.86 | 7150 | 2.1620 |
1.5989 | 2.88 | 7200 | 2.1571 |
1.6181 | 2.9 | 7250 | 2.1598 |
1.5831 | 2.92 | 7300 | 2.1560 |
1.555 | 2.94 | 7350 | 2.1529 |
1.5387 | 2.96 | 7400 | 2.1593 |
1.5477 | 2.98 | 7450 | 2.1608 |
1.4989 | 3.0 | 7500 | 2.1686 |
Framework versions
- Transformers 4.36.2
- Pytorch 1.13.1+cu117
- Datasets 2.17.1
- Tokenizers 0.15.2