|
--- |
|
library_name: peft |
|
datasets: |
|
- qwedsacf/grade-school-math-instructions |
|
language: |
|
- en |
|
metrics: |
|
- perplexity |
|
--- |
|
## Training procedure |
|
|
|
|
|
The following `bitsandbytes` quantization config was used during training: |
|
- load_in_8bit: True |
|
- load_in_4bit: False |
|
- llm_int8_threshold: 6.0 |
|
- llm_int8_skip_modules: None |
|
- llm_int8_enable_fp32_cpu_offload: False |
|
- llm_int8_has_fp16_weight: False |
|
- bnb_4bit_quant_type: fp4 |
|
- bnb_4bit_use_double_quant: False |
|
- bnb_4bit_compute_dtype: float32 |
|
|
|
### Model Description |
|
|
|
For more information on how it was created, check out the following link: https://github.com/DunnBC22/NLP_Projects/blob/main/OPT%20Models/Grade%20School%20Math%20Instructions%20Fine-Tune%20OPT.ipynb. |
|
|
|
### Intended uses & limitations |
|
|
|
This is intended to show the possibilities. It is mainly limited by the input data. |
|
|
|
### Training & Evaluation Dataset |
|
|
|
Dataset Source: https://huggingface.co/datasets/qwedsacf/grade-school-math-instructions |
|
|
|
### Hyperparameters Used |
|
|
|
| Hyperperameter | Value | |
|
|:-----:|:-----:| |
|
| Model Checkpoint | facebook/opt-2.7b | |
|
| per_device_train_batch_size | 4 | |
|
| gradient_accumulation_steps | 4 | |
|
| fp16 | True | |
|
| warmup_steps | 225 | |
|
| learning_rate | 2e-4 | |
|
| Training Steps | 450 |
|
|
|
|
|
### Framework versions |
|
|
|
| Library | Version | |
|
|:-----:|:-----:| |
|
| Python | 3.10.1 | |
|
| Torch | 2.0.1+cu118 | |
|
| Datasets | 2.14.4 | |
|
| Transformer | 4.31.0 | |
|
| PEFT | 0.4.0 |
|
|
|
|
|
### Metric |
|
|
|
Perplexity = 6.35 |