This is the official model from the publication "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models" (arXiv, 2024).
TLDR: Divergent Chain of Thought (DCoT) consists of requiring models to generate multiple CoTs before choosing an answer. Adding DCoT data to instruction tuning allows models to improve performance through self-correction.
Load the Model
from peft import LoraConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base_model_path = "microsoft/phi-2"
model = AutoModelForCausalLM.from_pretrained(
base_model_path,
torch_dtype=torch.bfloat16,
device_map="auto",
)
peft_model_id = "haritzpuerto/phi-2-dcot/"
model.load_adapter(peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_path)
Run the model
Prompt Template
[Question] {question} [Context] {document} [Options] {answer_options} [Number of answers] {k}
Note, that not all commands (text in brackets) are mandatory. [Context]
and [Options]
are optional.
[Context]
refers to a paragraph that contains the answer to a question (for span-extraction QA).[Options]
refers to a list of candidate answers (for multiple-choice QA). The format isA) {answer option 1} B) {answer option 2}, ...
The minimal template is
[Question] {question} [Number of answers] {k}
The inclusion of context and options depends on your tasks.
Response format
You should expect the model returning the following type of text
[Answer 1]CoT_1
[Answer 2]CoT_2
...
[Final answer] answer
You should get as many answers as requested with the command [Number of answers] {k}
Run Example
prompt = "[Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?\n[Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.\n[Number of answers] 2\n[Answer 1] "
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs.to("cuda"), max_length=1024)
print(tokenizer.decode(output[0]))
You should get an output similar to:
<s> [Question] Juan and LaKeisha roll a few objects down a ramp. They want to see which object rolls the farthest. What should they do so they can repeat their investigation?
[Options] A) Put the objects in groups. B) Change the height of the ramp. C) Choose different objects to roll. D) Record the details of the investigation.
[Number of answers] 2
[Answer 1] 1. Juan and LaKeisha want to see which object rolls the farthest.
2. They have already rolled a few objects down the ramp.
3. To repeat their investigation, they need to do something that will affect the outcome of the experiment.
4. Putting the objects in groups will not affect the outcome of the experiment.
5. Changing the height of the ramp may affect the outcome, but it is not the best option as it requires changing the setup of the experiment.
6. Choosing different objects to roll may also affect the outcome, but it is not the best option as it does not address the issue of repeating the experiment.
7. The best option is to record the details of the investigation. This includes the objects used, the height of the ramp, and any other relevant information. By recording the details, Juan and LaKeisha can repeat the experiment with the same conditions and compare the results.
[Answer 2] Step 1: Identify the problem and the question.
Problem: Juan and LaKeisha want to see which object rolls the farthest.
Question: What should they do to repeat their investigation?
Step 2: Evaluate the options.
A) Put the objects in groups. - This option does not directly relate to the question of which object rolls the farthest, so it can be eliminated.
B) Change the height of the ramp. - This option also does not directly relate to the question of which object rolls the farthest, so it can be eliminated.
C) Choose different objects to roll. - This option is a possible solution to the question, but it does not guarantee that the object will roll the farthest.
D) Record the details of the investigation. - This option is a necessary step to repeat the investigation.
Step 3: Choose the best option.
The best option to repeat the investigation is to record the details of the investigation. This will allow them to replicate the conditions of the original experiment and compare the results.
[Final answer] D) Record the details of the investigation.</s>
Training details
We train all models using LoRA with the PEFT library. The main parameters are:
Param. name | Value |
---|---|
lora_r | 64 |
lora_alpha | 16 |
lora_dropout | 0.1 |
batch size | 4 |
learning_rate | 2e-4 |
weight_decay | 0.001 |
optim | paged_adamw_32bit |
lr_scheduler_type | constant |
Please check Appendix B of the paper for more details.
Cite
If you find our work useful, please consider citing it using the following citation:
@misc{puerto2024dcot,
title={Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models},
author={Haritz Puerto and Tilek Chubakov and Xiaodan Zhu and Harish Tayyar Madabushi and Iryna Gurevych},
year={2024},
eprint={2407.03181},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.03181},
}
- Downloads last month
- 24
Model tree for haritzpuerto/phi-2-dcot
Base model
microsoft/phi-2