add model card
Browse files
README.md
CHANGED
@@ -1,3 +1,65 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: peft
|
3 |
+
license: apache-2.0
|
4 |
+
---
|
5 |
+
|
6 |
+
### Framework versions
|
7 |
+
|
8 |
+
|
9 |
+
- PEFT 0.5.0
|
10 |
+
---
|
11 |
+
|
12 |
+
# Model Card for MCQ-Classifier-MMLU-EFG
|
13 |
+
MCQ-Classifier is a parameter-efficient finetuned 7B Mistral-7b-base-v0.1 to automatically detect the model answers to Multiple Choice Questions.
|
14 |
+
|
15 |
+
This model is trained on annotated model outputs to MMLU dataset. We collected responses from Llama2-7b-chat, Llama2-13b-chat and Mistral-7b-Inst-v0.2
|
16 |
+
|
17 |
+
For full details of this model please read our [paper](https://arxiv.org/abs/2404.08382).
|
18 |
+
|
19 |
+
## "EFG"
|
20 |
+
During our annotation phase, we noticed that models may not choose the available answer candiates but refuse to answer or claim "No correct answer available."
|
21 |
+
Therefore, we consider other three cases "Refusal", "No correct answer", "I don't know" and add those three options into the answer candidates, extending the option range from "A-D" to "A-G".
|
22 |
+
Note that we shuffle the oder of the options in our dataset, therefore, "EFG" does not necessarily correspond to "Refusal", "No correct answer" and "I don't know".
|
23 |
+
|
24 |
+
Also note that, if the model refuse to answer due to safety reason, the answer will be mapped to the refuse option
|
25 |
+
such as "D. Refused".
|
26 |
+
|
27 |
+
## Run the model
|
28 |
+
Your should construct your input into such format: model_reponse + "\nReferences:" + references + "\nAnswer:"
|
29 |
+
|
30 |
+
For example:
|
31 |
+
```
|
32 |
+
inputs = ' Sure! I can help you with that. The answer to the question is:\n\nB. Frederick Taylor \nReferences: \nA. Lillian Gilbreth \nB. Frederick Taylor \nC. No correct answer is given \nD. I do not know \nE. Refused \nF. Mary Parker Follett \nG. Elton Mayo \nAnswer:'
|
33 |
+
```
|
34 |
+
then feed it to the classifier:
|
35 |
+
```python
|
36 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
37 |
+
from peft import PeftModel, PeftConfig
|
38 |
+
config = PeftConfig.from_pretrained("mainlp/MCQ-Classifier-MMLU-EFG")
|
39 |
+
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
|
40 |
+
model = PeftModel.from_pretrained(base_model, "mainlp/MCQ-Classifier-EFG")
|
41 |
+
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
|
42 |
+
to_classify = f"""<s>[INST] Classify the response.{inputs} [/INST]"""
|
43 |
+
model_input = tokenizer(to_classify, return_tensors="pt")
|
44 |
+
output = merged_model.generate(**model_input, max_new_tokens=1, do_sample=False)
|
45 |
+
print(tokenizer.decode(output.sequences[0], skip_special_tokens=True))
|
46 |
+
```
|
47 |
+
|
48 |
+
## Cite
|
49 |
+
```
|
50 |
+
@article{wang2024my,
|
51 |
+
title={" My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models},
|
52 |
+
author={Wang, Xinpeng and Ma, Bolei and Hu, Chengzhi and Weber-Genzel, Leon and R{\"o}ttger, Paul and Kreuter, Frauke and Hovy, Dirk and Plank, Barbara},
|
53 |
+
journal={arXiv preprint arXiv:2402.14499},
|
54 |
+
year={2024}
|
55 |
+
}
|
56 |
+
```
|
57 |
+
|
58 |
+
```
|
59 |
+
@article{wang2024look,
|
60 |
+
title={Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think},
|
61 |
+
author={Wang, Xinpeng and Hu, Chengzhi and Ma, Bolei and R{\"o}ttger, Paul and Plank, Barbara},
|
62 |
+
journal={arXiv preprint arXiv:2404.08382},
|
63 |
+
year={2024}
|
64 |
+
}
|
65 |
+
```
|