sparse_sparse_80_percent_pretraining_warmup_20K_0_2_steps_5k
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on the openwebtext dataset. It achieves the following results on the evaluation set:
- Loss: 4.9832
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 2
- total_train_batch_size: 48
- total_eval_batch_size: 48
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- training_steps: 5000
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.2964 | 0.02 | 50 | 1.2517 |
1.1086 | 0.05 | 100 | 1.0714 |
0.9727 | 0.07 | 150 | 0.9857 |
0.9326 | 0.1 | 200 | 0.9357 |
0.8944 | 0.12 | 250 | 0.8988 |
0.872 | 0.15 | 300 | 0.8700 |
0.8523 | 0.17 | 350 | 0.8516 |
0.8369 | 0.19 | 400 | 0.8358 |
0.8372 | 0.22 | 450 | 0.8226 |
0.8221 | 0.24 | 500 | 0.8116 |
0.8093 | 0.27 | 550 | 0.8020 |
0.804 | 0.29 | 600 | 0.7937 |
0.8111 | 0.32 | 650 | 0.7935 |
0.7949 | 0.34 | 700 | 0.7872 |
0.7947 | 0.36 | 750 | 0.7815 |
0.8045 | 0.39 | 800 | 0.7771 |
0.7706 | 0.41 | 850 | 0.7724 |
0.7669 | 0.44 | 900 | 0.7683 |
0.7691 | 0.46 | 950 | 0.7825 |
0.7737 | 0.48 | 1000 | 0.7779 |
0.7595 | 0.51 | 1050 | 0.7748 |
0.7672 | 0.53 | 1100 | 0.7709 |
0.7725 | 0.56 | 1150 | 0.7681 |
0.7551 | 0.58 | 1200 | 0.7658 |
0.8035 | 0.61 | 1250 | 0.8159 |
0.804 | 0.63 | 1300 | 0.8068 |
0.8074 | 0.65 | 1350 | 0.8016 |
0.7801 | 0.68 | 1400 | 0.7982 |
0.7842 | 0.7 | 1450 | 0.7951 |
0.7938 | 0.73 | 1500 | 0.7907 |
0.8625 | 0.75 | 1550 | 0.8568 |
0.8467 | 0.78 | 1600 | 0.8443 |
0.8216 | 0.8 | 1650 | 0.8379 |
0.8334 | 0.82 | 1700 | 0.8332 |
0.8287 | 0.85 | 1750 | 0.8292 |
0.8251 | 0.87 | 1800 | 0.8250 |
0.8969 | 0.9 | 1850 | 0.8790 |
0.8619 | 0.92 | 1900 | 0.8696 |
0.8566 | 0.95 | 1950 | 0.8645 |
0.8633 | 0.97 | 2000 | 0.8599 |
0.8622 | 0.99 | 2050 | 0.8558 |
0.8336 | 1.02 | 2100 | 0.8520 |
0.918 | 1.04 | 2150 | 0.9045 |
0.8755 | 1.07 | 2200 | 0.8960 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.1+cu121
- Datasets 2.15.0
- Tokenizers 0.15.0
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for thrunlab/sparse_sparse_80_percent_pretraining_warmup_20K_0_2_steps_5k
Base model
mistralai/Mistral-7B-v0.1
Finetuned
mistralai/Mistral-7B-Instruct-v0.1