phi-2-basic-maths
This model is a fine-tuned version of microsoft/phi-2 on an GSM8K dataset.
Model Description
The objective of this model is to evaluate Phi-2's ability to provide correct solutions to reasoning problems after fine-tuning. This model was trained using techniques such as TRL, LoRA quantization, and Flash Attention.
To test it, you can use the following code:
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, pipeline
# Specify the model ID
peft_model_id = "Menouar/phi-2-basic-maths"
# Load Model with PEFT adapter
model = AutoPeftModelForCausalLM.from_pretrained(
peft_model_id,
device_map="auto",
torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
Training procedure
The complete training procedure can be found on my Notebook.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 42
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 84
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 30
Training results
The training results can be found on Tensoboard.
Evaluation procedure
The complete Evaluation procedure can be found on my Notebook.
Accuracy: 36.16%
Unclear answers: 7.81%
Framework versions
- PEFT 0.8.2
- Transformers 4.38.0.dev0
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.1
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 53.60 |
AI2 Reasoning Challenge (25-Shot) | 55.80 |
HellaSwag (10-Shot) | 71.15 |
MMLU (5-Shot) | 47.27 |
TruthfulQA (0-shot) | 41.40 |
Winogrande (5-shot) | 75.30 |
GSM8k (5-shot) | 30.71 |
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Menouar/phi-2-basic-maths
Base model
microsoft/phi-2Dataset used to train Menouar/phi-2-basic-maths
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard55.800
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard71.150
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard47.270
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard75.300
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard41.400
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard30.700
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard41.400