plaguss HF staff commited on
Commit
447453b
1 Parent(s): 32dbfed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +184 -4
README.md CHANGED
@@ -17,15 +17,195 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
  ## Quick start
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ```python
 
21
  from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="plaguss/Qwen2.5-0.5B-Math-Shepherd-PRM-0.2", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ```
28
 
 
29
  ## Training procedure
30
 
31
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/plaguss/huggingface/runs/obk416rg)
 
17
 
18
  ## Quick start
19
 
20
+ Example 1)
21
+
22
+ ```python
23
+ from datasets import load_dataset
24
+ from transformers import pipeline
25
+ import os
26
+ os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
27
+
28
+ model_name = "plaguss/Qwen2.5-0.5B-Math-Shepherd-PRM-0.2"
29
+
30
+ pipe = pipeline("token-classification", model=model_name, device="cuda")
31
+ dataset = load_dataset("trl-lib/math_shepherd")
32
+ example = dataset["test"][10]
33
+
34
+ sep = "\n"
35
+
36
+ print(sep.join((example["prompt"], *example["completions"])))
37
+ for idx in range(1, len(example["completions"])+1):
38
+ text = sep.join((example["prompt"], *example["completions"][0:idx])) + sep
39
+ output = pipe(text)
40
+ score = float(output[-1]["score"])
41
+ pred = True if output[-1]["entity"] == "LABEL_1" else False
42
+ print(f"Step {idx}\tPredicted (score): {pred} ({score:.2f})\tLabel: {example['labels'][idx-1]}")
43
+
44
+ # Grandma gave Bryce and Carter some raisins. Bryce received 6 more raisins than Carter, and Carter received half the number of raisins Bryce received. How many raisins did Bryce receive?
45
+ # Step 1: Let $b$ be the number of raisins Bryce received and $c$ be the number of raisins Carter received.
46
+ # Step 2: We are given that $b = c + 6$ and $c = \frac{1}{2}b$.
47
+ # Step 3: Substituting the second equation into the first equation, we get $b = c + 6 = \frac{1}{2}b + 6$.
48
+ # Step 4: Simplifying, we have $b = \frac{1}{2}b + 6$.
49
+ # Step 5: Subtracting $\frac{1}{2}b$ from both sides, we get $\frac{1}{2}b - b = 6$.
50
+ # Step 6: Simplifying further, we have $\frac{1}{2}b - 2b = 6$.
51
+ # Step 7: Combining like terms, we have $-\frac{1}{2}b = 6$.
52
+ # Step 8: Multiplying both sides by $-2$, we get $b = -12$.
53
+ # Step 9: Therefore, Bryce received $\boxed{-12}$ raisins.The answer is: -12
54
+ # Step 1 Predicted (score): True (0.99) Label: True
55
+ # Step 2 Predicted (score): True (0.99) Label: True
56
+ # Step 3 Predicted (score): True (0.94) Label: True
57
+ # Step 4 Predicted (score): True (0.82) Label: True
58
+ # Step 5 Predicted (score): True (0.58) Label: True
59
+ # Step 6 Predicted (score): False (0.62) Label: False
60
+ # Step 7 Predicted (score): False (0.77) Label: False
61
+ # Step 8 Predicted (score): False (0.91) Label: False
62
+ # Step 9 Predicted (score): False (0.97) Label: False
63
+ ```
64
+
65
+ Example 1)
66
+
67
+ ```python
68
+ from datasets import load_dataset
69
+ from transformers import pipeline
70
+ import os
71
+ os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
72
+
73
+ model_name = "plaguss/Qwen2.5-0.5B-Math-Shepherd-PRM-0.2"
74
+
75
+ pipe = pipeline("token-classification", model=model_name, device="cuda")
76
+ dataset = load_dataset("trl-lib/math_shepherd")
77
+ example = dataset["test"][10]
78
+
79
+ sep = "\n"
80
+
81
+ print(sep.join((example["prompt"], *example["completions"])))
82
+ for idx in range(1, len(example["completions"])+1):
83
+ text = sep.join((example["prompt"], *example["completions"][0:idx])) + sep
84
+ output = pipe(text)
85
+ score = float(output[-1]["score"])
86
+ pred = True if output[-1]["entity"] == "LABEL_1" else False
87
+ print(f"Step {idx}\tPredicted (score): {pred} ({score:.2f})\tLabel: {example['labels'][idx-1]}")
88
+
89
+ # Grandma gave Bryce and Carter some raisins. Bryce received 6 more raisins than Carter, and Carter received half the number of raisins Bryce received. How many raisins did Bryce receive?
90
+ # Step 1: Let $b$ be the number of raisins Bryce received and $c$ be the number of raisins Carter received.
91
+ # Step 2: We are given that $b = c + 6$ and $c = \frac{1}{2}b$.
92
+ # Step 3: Substituting the second equation into the first equation, we get $b = c + 6 = \frac{1}{2}b + 6$.
93
+ # Step 4: Simplifying, we have $b = \frac{1}{2}b + 6$.
94
+ # Step 5: Subtracting $\frac{1}{2}b$ from both sides, we get $\frac{1}{2}b - b = 6$.
95
+ # Step 6: Simplifying further, we have $\frac{1}{2}b - 2b = 6$.
96
+ # Step 7: Combining like terms, we have $-\frac{1}{2}b = 6$.
97
+ # Step 8: Multiplying both sides by $-2$, we get $b = -12$.
98
+ # Step 9: Therefore, Bryce received $\boxed{-12}$ raisins.The answer is: -12
99
+ # Step 1 Predicted (score): True (0.99) Label: True
100
+ # Step 2 Predicted (score): True (0.99) Label: True
101
+ # Step 3 Predicted (score): True (0.94) Label: True
102
+ # Step 4 Predicted (score): True (0.82) Label: True
103
+ # Step 5 Predicted (score): True (0.58) Label: True
104
+ # Step 6 Predicted (score): False (0.62) Label: False
105
+ # Step 7 Predicted (score): False (0.77) Label: False
106
+ # Step 8 Predicted (score): False (0.91) Label: False
107
+ # Step 9 Predicted (score): False (0.97) Label: False
108
+ ```
109
+
110
+ Example 2)
111
+
112
+ ```python
113
+ from datasets import load_dataset
114
+ from transformers import pipeline
115
+ import os
116
+ os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
117
+
118
+ model_name = "plaguss/Qwen2.5-0.5B-Math-Shepherd-PRM-0.2"
119
+
120
+ pipe = pipeline("token-classification", model=model_name, device="cuda")
121
+ dataset = load_dataset("trl-lib/math_shepherd")
122
+ i = 32 # 10, 32
123
+ example = dataset["test"][i]
124
+
125
+ sep = "\n"
126
+
127
+ print(sep.join((example["prompt"], *example["completions"])))
128
+ for idx in range(1, len(example["completions"])+1):
129
+ text = sep.join((example["prompt"], *example["completions"][0:idx])) + sep
130
+ output = pipe(text)
131
+ score = float(output[-1]["score"])
132
+ pred = True if output[-1]["entity"] == "LABEL_1" else False
133
+ print(f"Step {idx}\tPredicted (score): {pred} ({score:.2f})\tLabel: {example['labels'][idx-1]}")
134
+
135
+ # In the Golden State Team, each player earned points. Draymond earned 12 points, Curry earned twice the points as Draymond, Kelly earned 9, Durant earned twice the points as Kelly, Klay earned half the points as Draymond. How many points did the Golden States have in total?
136
+ # Step 1: Draymond earned 12 points, Curry earned twice the points as Draymond, which is 2*12 = 24 points.
137
+ # Step 2: Kelly earned 9 points, Durant earned twice the points as Kelly, which is 2*9 = 18 points.
138
+ # Step 3: Klay earned half the points as Draymond, which is 12/2 = <<12/2=6>>6 points.
139
+ # Step 4: The Golden State Team had 12+24+9+18+6 = <<12+24+9+18+6=51>>51 points. The answer is: 51
140
+ # Step 1 Predicted (score): True (1.00) Label: True
141
+ # Step 2 Predicted (score): True (1.00) Label: True
142
+ # Step 3 Predicted (score): True (1.00) Label: True
143
+ # Step 4 Predicted (score): False (0.96) Label: False
144
+ ```
145
+
146
+ Example 3)
147
+
148
+ This example corresponds to the one shown in the [peiyi9979/math-shepherd-mistral-7b-prm](https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm):
149
+
150
  ```python
151
+ from datasets import load_dataset
152
  from transformers import pipeline
153
+ import os
154
+ os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
155
+
156
+ model_name = "plaguss/Qwen2.5-0.5B-Math-Shepherd-PRM-0.2"
157
+
158
+ pipe = pipeline("token-classification", model=model_name, device="cuda")
159
+
160
+ examples = [
161
+ {
162
+ "prompt": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
163
+ "completions": [
164
+ "Step 1: Janet's ducks lay 16 eggs per day.",
165
+ 'Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left.',
166
+ 'Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left.',
167
+ "Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 18"
168
+ ],
169
+ "labels": [True, True, True, True]
170
+ },
171
+ {
172
+ "prompt": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
173
+ "completions": [
174
+ "Step 1: Janet's ducks lay 16 eggs per day.",
175
+ 'Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left.',
176
+ 'Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left.',
177
+ "Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 17"
178
+ ],
179
+ "labels": [True, True, True, False]
180
+ },
181
+
182
+ ]
183
+
184
+
185
+ sep = "\n"
186
 
187
+ for i, example in enumerate(examples):
188
+ print(f"- Example {i}:")
189
+ for idx in range(1, len(example["completions"])+1):
190
+ text = "\n".join((example["prompt"], *example["completions"][0:idx])) + "\n"
191
+ output = pipe(text)
192
+ score = float(output[-1]["score"])
193
+ pred = True if output[-1]["entity"] == "LABEL_1" else False
194
+ print(f"Step {idx}\tPredicted (score): {pred} ({score:.2f})\tLabel: {example['labels'][idx-1]}")
195
+
196
+ - Example 0:
197
+ Step 1 Predicted (score): True (0.90) Label: True
198
+ Step 2 Predicted (score): False (0.55) Label: True
199
+ Step 3 Predicted (score): False (0.62) Label: True
200
+ Step 4 Predicted (score): False (0.90) Label: True
201
+ - Example 1:
202
+ Step 1 Predicted (score): True (0.90) Label: True
203
+ Step 2 Predicted (score): False (0.55) Label: True
204
+ Step 3 Predicted (score): False (0.62) Label: True
205
+ Step 4 Predicted (score): False (0.96) Label: False
206
  ```
207
 
208
+
209
  ## Training procedure
210
 
211
  [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/plaguss/huggingface/runs/obk416rg)