File size: 8,805 Bytes
ad6ce55 32dbfed ad6ce55 32dbfed ad6ce55 447453b 4f4cbe1 447453b ad6ce55 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
---
base_model: Qwen/Qwen2.5-0.5B
datasets: trl-lib/math_shepherd
library_name: transformers
model_name: Qwen2.5-0.5B-Math-Shepherd-PRM-0.2
tags:
- generated_from_trainer
- trl
- stepwise-reward-trainer
licence: license
---
# Model Card for Qwen2.5-0.5B-Math-Shepherd-PRM-0.2
This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on the [trl-lib/math_shepherd](https://huggingface.co/datasets/trl-lib/math_shepherd) dataset.
It has been trained using [TRL](https://github.com/huggingface/trl).
## Quick start
Example 1)
```python
from datasets import load_dataset
from transformers import pipeline
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
model_name = "plaguss/Qwen2.5-0.5B-Math-Shepherd-PRM-0.2"
pipe = pipeline("token-classification", model=model_name, device="cuda")
dataset = load_dataset("trl-lib/math_shepherd")
example = dataset["test"][10]
sep = "\n"
print(sep.join((example["prompt"], *example["completions"])))
for idx in range(1, len(example["completions"])+1):
text = sep.join((example["prompt"], *example["completions"][0:idx])) + sep
output = pipe(text)
score = float(output[-1]["score"])
pred = True if output[-1]["entity"] == "LABEL_1" else False
print(f"Step {idx}\tPredicted (score): {pred} ({score:.2f})\tLabel: {example['labels'][idx-1]}")
# Grandma gave Bryce and Carter some raisins. Bryce received 6 more raisins than Carter, and Carter received half the number of raisins Bryce received. How many raisins did Bryce receive?
# Step 1: Let $b$ be the number of raisins Bryce received and $c$ be the number of raisins Carter received.
# Step 2: We are given that $b = c + 6$ and $c = \frac{1}{2}b$.
# Step 3: Substituting the second equation into the first equation, we get $b = c + 6 = \frac{1}{2}b + 6$.
# Step 4: Simplifying, we have $b = \frac{1}{2}b + 6$.
# Step 5: Subtracting $\frac{1}{2}b$ from both sides, we get $\frac{1}{2}b - b = 6$.
# Step 6: Simplifying further, we have $\frac{1}{2}b - 2b = 6$.
# Step 7: Combining like terms, we have $-\frac{1}{2}b = 6$.
# Step 8: Multiplying both sides by $-2$, we get $b = -12$.
# Step 9: Therefore, Bryce received $\boxed{-12}$ raisins.The answer is: -12
# Step 1 Predicted (score): True (0.99) Label: True
# Step 2 Predicted (score): True (0.99) Label: True
# Step 3 Predicted (score): True (0.94) Label: True
# Step 4 Predicted (score): True (0.82) Label: True
# Step 5 Predicted (score): True (0.58) Label: True
# Step 6 Predicted (score): False (0.62) Label: False
# Step 7 Predicted (score): False (0.77) Label: False
# Step 8 Predicted (score): False (0.91) Label: False
# Step 9 Predicted (score): False (0.97) Label: False
```
Example 2)
```python
from datasets import load_dataset
from transformers import pipeline
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
model_name = "plaguss/Qwen2.5-0.5B-Math-Shepherd-PRM-0.2"
pipe = pipeline("token-classification", model=model_name, device="cuda")
dataset = load_dataset("trl-lib/math_shepherd")
i = 32 # 10, 32
example = dataset["test"][i]
sep = "\n"
print(sep.join((example["prompt"], *example["completions"])))
for idx in range(1, len(example["completions"])+1):
text = sep.join((example["prompt"], *example["completions"][0:idx])) + sep
output = pipe(text)
score = float(output[-1]["score"])
pred = True if output[-1]["entity"] == "LABEL_1" else False
print(f"Step {idx}\tPredicted (score): {pred} ({score:.2f})\tLabel: {example['labels'][idx-1]}")
# In the Golden State Team, each player earned points. Draymond earned 12 points, Curry earned twice the points as Draymond, Kelly earned 9, Durant earned twice the points as Kelly, Klay earned half the points as Draymond. How many points did the Golden States have in total?
# Step 1: Draymond earned 12 points, Curry earned twice the points as Draymond, which is 2*12 = 24 points.
# Step 2: Kelly earned 9 points, Durant earned twice the points as Kelly, which is 2*9 = 18 points.
# Step 3: Klay earned half the points as Draymond, which is 12/2 = <<12/2=6>>6 points.
# Step 4: The Golden State Team had 12+24+9+18+6 = <<12+24+9+18+6=51>>51 points. The answer is: 51
# Step 1 Predicted (score): True (1.00) Label: True
# Step 2 Predicted (score): True (1.00) Label: True
# Step 3 Predicted (score): True (1.00) Label: True
# Step 4 Predicted (score): False (0.96) Label: False
```
Example 3)
This example corresponds to the one shown in the [peiyi9979/math-shepherd-mistral-7b-prm](https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm):
```python
from datasets import load_dataset
from transformers import pipeline
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
model_name = "plaguss/Qwen2.5-0.5B-Math-Shepherd-PRM-0.2"
pipe = pipeline("token-classification", model=model_name, device="cuda")
examples = [
{
"prompt": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"completions": [
"Step 1: Janet's ducks lay 16 eggs per day.",
'Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left.',
'Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left.',
"Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 18"
],
"labels": [True, True, True, True]
},
{
"prompt": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"completions": [
"Step 1: Janet's ducks lay 16 eggs per day.",
'Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left.',
'Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left.',
"Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 17"
],
"labels": [True, True, True, False]
},
]
sep = "\n"
for i, example in enumerate(examples):
print(f"- Example {i}:")
for idx in range(1, len(example["completions"])+1):
text = "\n".join((example["prompt"], *example["completions"][0:idx])) + "\n"
output = pipe(text)
score = float(output[-1]["score"])
pred = True if output[-1]["entity"] == "LABEL_1" else False
print(f"Step {idx}\tPredicted (score): {pred} ({score:.2f})\tLabel: {example['labels'][idx-1]}")
# - Example 0:
# Step 1 Predicted (score): True (0.90) Label: True
# Step 2 Predicted (score): False (0.55) Label: True
# Step 3 Predicted (score): False (0.62) Label: True
# Step 4 Predicted (score): False (0.90) Label: True
# - Example 1:
# Step 1 Predicted (score): True (0.90) Label: True
# Step 2 Predicted (score): False (0.55) Label: True
# Step 3 Predicted (score): False (0.62) Label: True
# Step 4 Predicted (score): False (0.96) Label: False
```
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/plaguss/huggingface/runs/obk416rg)
This model was trained with Stepwise Reward.
### Framework versions
- TRL: 0.13.0.dev0
- Transformers: 4.47.0
- Pytorch: 2.4.1
- Datasets: 3.0.1
- Tokenizers: 0.21.0
## Citations
Cite Stepwise Reward as:
```bibtex
@article{uesato2022solving,
title = {Solving Math Word Problems With Process- and Outcome-Based Feedback},
author = {Uesato, Jonathan and Kushman, Nate and Kumar, Ramana and Song, Francis and Siegel, Noah and Wang, Lisa and Creswell, Antonia and Irving, Geoffrey and Higgins, Irina},
year = 2022,
journal = {arXiv preprint arXiv:2211.14275}
}
```
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
``` |