mainlp
/

MCQ-Classifier-MMLU-EFG

Model card Files Files and versions Community

xinpeng commited on May 16

Commit

8a5cfe0

•

1 Parent(s): 39676f7

add model card

Files changed (1) hide show

README.md +65 -3

README.md CHANGED Viewed

@@ -1,3 +1,65 @@
----
-license: apache-2.0
----

+---
+library_name: peft
+license: apache-2.0
+---
+### Framework versions
+- PEFT 0.5.0
+---
+# Model Card for MCQ-Classifier-MMLU-EFG
+MCQ-Classifier is a parameter-efficient finetuned 7B Mistral-7b-base-v0.1 to automatically detect the model answers to Multiple Choice Questions.
+This model is trained on annotated model outputs to MMLU dataset. We collected responses from Llama2-7b-chat, Llama2-13b-chat and Mistral-7b-Inst-v0.2
+For full details of this model please read our [paper](https://arxiv.org/abs/2404.08382).
+## "EFG"
+During our annotation phase, we noticed that models may not choose the available answer candiates but refuse to answer or claim "No correct answer available."
+Therefore, we consider other three cases "Refusal", "No correct answer", "I don't know" and add those three options into the answer candidates, extending the option range from "A-D" to "A-G".
+Note that we shuffle the oder of the options in our dataset, therefore, "EFG" does not necessarily correspond to "Refusal", "No correct answer" and "I don't know".
+Also note that, if the model refuse to answer due to safety reason, the answer will be mapped to the refuse option
+such as "D. Refused".
+## Run the model
+Your should construct your input into such format: model_reponse + "\nReferences:" + references + "\nAnswer:"
+For example:
+```
+inputs = ' Sure! I can help you with that. The answer to the question is:\n\nB. Frederick Taylor \nReferences: \nA. Lillian Gilbreth \nB. Frederick Taylor \nC. No correct answer is given \nD. I do not know \nE. Refused \nF. Mary Parker Follett \nG. Elton Mayo \nAnswer:'
+```
+then feed it to the classifier:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel, PeftConfig
+config = PeftConfig.from_pretrained("mainlp/MCQ-Classifier-MMLU-EFG")
+base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
+model = PeftModel.from_pretrained(base_model, "mainlp/MCQ-Classifier-EFG")
+tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
+to_classify = f"""<s>[INST] Classify the response.{inputs} [/INST]"""
+model_input = tokenizer(to_classify, return_tensors="pt")
+output =  merged_model.generate(**model_input, max_new_tokens=1, do_sample=False)
+print(tokenizer.decode(output.sequences[0], skip_special_tokens=True))
+```
+## Cite
+```
+@article{wang2024my,
+  title={" My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models},
+  author={Wang, Xinpeng and Ma, Bolei and Hu, Chengzhi and Weber-Genzel, Leon and R{\"o}ttger, Paul and Kreuter, Frauke and Hovy, Dirk and Plank, Barbara},
+  journal={arXiv preprint arXiv:2402.14499},
+  year={2024}
+}
+```
+```
+@article{wang2024look,
+  title={Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think},
+  author={Wang, Xinpeng and Hu, Chengzhi and Ma, Bolei and R{\"o}ttger, Paul and Plank, Barbara},
+  journal={arXiv preprint arXiv:2404.08382},
+  year={2024}
+}
+```