vwxyzjn commited on
Commit
ed41d79
·
verified ·
1 Parent(s): fd424bd

Update evaluation results via RewardBench

Browse files
Files changed (1) hide show
  1. README.md +241 -39
README.md CHANGED
@@ -1,59 +1,261 @@
1
  ---
2
- license: mit
3
- base_model: HuggingFaceH4/mistral-7b-sft-beta
4
- tags:
5
- - trl
6
- - reward-trainer
7
- - generated_from_trainer
8
  model-index:
9
- - name: rm_zephyr_new
10
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # rm_zephyr_new
17
 
18
- This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on an unknown dataset.
19
 
20
- ## Model description
21
 
22
- More information needed
23
 
24
- ## Intended uses & limitations
25
 
26
- More information needed
27
 
28
- ## Training and evaluation data
29
 
30
- More information needed
31
 
32
- ## Training procedure
 
 
 
 
 
 
33
 
34
- ### Training hyperparameters
35
 
36
- The following hyperparameters were used during training:
37
- - learning_rate: 3e-06
38
- - train_batch_size: 1
39
- - eval_batch_size: 1
40
- - seed: 42
41
- - distributed_type: multi-GPU
42
- - num_devices: 7
43
- - gradient_accumulation_steps: 32
44
- - total_train_batch_size: 224
45
- - total_eval_batch_size: 7
46
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
- - lr_scheduler_type: linear
48
- - num_epochs: 0.1
49
 
50
- ### Training results
 
 
51
 
 
52
 
 
53
 
54
- ### Framework versions
55
 
56
- - Transformers 4.40.1
57
- - Pytorch 2.2.1+cu121
58
- - Datasets 2.18.0
59
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
 
 
 
 
 
3
  model-index:
4
+ - name: vwxyzjn/rm_zephyr_new
5
+ results:
6
+ - task:
7
+ type: preference_evaluation
8
+ dataset:
9
+ name: reward-bench
10
+ type: allenai/reward-bench
11
+ metrics:
12
+ - type: accuracy
13
+ value: 0.5343383584589615
14
+ - task:
15
+ type: preference_evaluation
16
+ dataset:
17
+ name: alpacaeval-easy
18
+ type: preference_dataset
19
+ metrics:
20
+ - type: accuracy
21
+ value: 0.88
22
+ - type: accuracy
23
+ value: 0.8947368421052632
24
+ - type: accuracy
25
+ value: 0.6842105263157895
26
+ - type: accuracy
27
+ value: 0.34558823529411764
28
+ - type: accuracy
29
+ value: 0.6646341463414634
30
+ - type: accuracy
31
+ value: 0.6951219512195121
32
+ - type: accuracy
33
+ value: 0.6707317073170732
34
+ - type: accuracy
35
+ value: 0.676829268292683
36
+ - type: accuracy
37
+ value: 0.6829268292682927
38
+ - type: accuracy
39
+ value: 0.5609756097560976
40
+ - type: accuracy
41
+ value: 0.31521739130434784
42
+ - type: accuracy
43
+ value: 0.5531914893617021
44
+ - type: accuracy
45
+ value: 0.43478260869565216
46
+ - type: accuracy
47
+ value: 0.6044776119402985
48
+ - type: accuracy
49
+ value: 0.64
50
+ - type: accuracy
51
+ value: 0.12751677852348994
52
+ - type: accuracy
53
+ value: 0.7857142857142857
54
+ - type: accuracy
55
+ value: 0.5405405405405406
56
+ - type: accuracy
57
+ value: 0.775
58
+ - type: accuracy
59
+ value: 0.18
60
+ - type: accuracy
61
+ value: 0.58
62
+ - type: accuracy
63
+ value: 0.461038961038961
64
+ - type: accuracy
65
+ value: 0.66
66
  ---
67
 
68
+ # Model Card for vwxyzjn/rm_zephyr_new
 
69
 
70
+ <!-- Provide a quick summary of what the model is/does. -->
71
 
 
72
 
 
73
 
74
+ ## Model Details
75
 
76
+ ### Model Description
77
 
78
+ <!-- Provide a longer summary of what this model is. -->
79
 
 
80
 
 
81
 
82
+ - **Developed by:** [More Information Needed]
83
+ - **Funded by [optional]:** [More Information Needed]
84
+ - **Shared by [optional]:** [More Information Needed]
85
+ - **Model type:** [More Information Needed]
86
+ - **Language(s) (NLP):** en
87
+ - **License:** [More Information Needed]
88
+ - **Finetuned from model [optional]:** [More Information Needed]
89
 
90
+ ### Model Sources [optional]
91
 
92
+ <!-- Provide the basic links for the model. -->
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
+ - **Repository:** [More Information Needed]
95
+ - **Paper [optional]:** [More Information Needed]
96
+ - **Demo [optional]:** [More Information Needed]
97
 
98
+ ## Uses
99
 
100
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
101
 
102
+ ### Direct Use
103
 
104
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ### Downstream Use [optional]
109
+
110
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
111
+
112
+ [More Information Needed]
113
+
114
+ ### Out-of-Scope Use
115
+
116
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
117
+
118
+ [More Information Needed]
119
+
120
+ ## Bias, Risks, and Limitations
121
+
122
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
123
+
124
+ [More Information Needed]
125
+
126
+ ### Recommendations
127
+
128
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
129
+
130
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
131
+
132
+ ## How to Get Started with the Model
133
+
134
+ Use the code below to get started with the model.
135
+
136
+ [More Information Needed]
137
+
138
+ ## Training Details
139
+
140
+ ### Training Data
141
+
142
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
143
+
144
+ [More Information Needed]
145
+
146
+ ### Training Procedure
147
+
148
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
149
+
150
+ #### Preprocessing [optional]
151
+
152
+ [More Information Needed]
153
+
154
+
155
+ #### Training Hyperparameters
156
+
157
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
158
+
159
+ #### Speeds, Sizes, Times [optional]
160
+
161
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
162
+
163
+ [More Information Needed]
164
+
165
+ ## Evaluation
166
+
167
+ <!-- This section describes the evaluation protocols and provides the results. -->
168
+
169
+ ### Testing Data, Factors & Metrics
170
+
171
+ #### Testing Data
172
+
173
+ <!-- This should link to a Dataset Card if possible. -->
174
+
175
+ [More Information Needed]
176
+
177
+ #### Factors
178
+
179
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
180
+
181
+ [More Information Needed]
182
+
183
+ #### Metrics
184
+
185
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ### Results
190
+
191
+ [More Information Needed]
192
+
193
+ #### Summary
194
+
195
+
196
+
197
+ ## Model Examination [optional]
198
+
199
+ <!-- Relevant interpretability work for the model goes here -->
200
+
201
+ [More Information Needed]
202
+
203
+ ## Environmental Impact
204
+
205
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
206
+
207
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
208
+
209
+ - **Hardware Type:** [More Information Needed]
210
+ - **Hours used:** [More Information Needed]
211
+ - **Cloud Provider:** [More Information Needed]
212
+ - **Compute Region:** [More Information Needed]
213
+ - **Carbon Emitted:** [More Information Needed]
214
+
215
+ ## Technical Specifications [optional]
216
+
217
+ ### Model Architecture and Objective
218
+
219
+ [More Information Needed]
220
+
221
+ ### Compute Infrastructure
222
+
223
+ [More Information Needed]
224
+
225
+ #### Hardware
226
+
227
+ [More Information Needed]
228
+
229
+ #### Software
230
+
231
+ [More Information Needed]
232
+
233
+ ## Citation [optional]
234
+
235
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
236
+
237
+ **BibTeX:**
238
+
239
+ [More Information Needed]
240
+
241
+ **APA:**
242
+
243
+ [More Information Needed]
244
+
245
+ ## Glossary [optional]
246
+
247
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
248
+
249
+ [More Information Needed]
250
+
251
+ ## More Information [optional]
252
+
253
+ [More Information Needed]
254
+
255
+ ## Model Card Authors [optional]
256
+
257
+ [More Information Needed]
258
+
259
+ ## Model Card Contact
260
+
261
+ [More Information Needed]