File size: 6,810 Bytes
ce087b2
 
 
fa5a3c6
ed66854
 
f7092b9
eb75edf
315f3f6
 
f7092b9
eb75edf
f7092b9
 
 
 
 
 
 
 
 
 
 
 
 
a974073
 
 
315f3f6
a974073
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
315f3f6
a974073
 
 
f7092b9
 
59397a5
fa5a3c6
 
 
 
 
3069b9f
fa5a3c6
 
 
 
eb75edf
 
 
 
 
 
 
 
 
fa5a3c6
 
 
 
 
 
 
eb75edf
 
 
 
 
 
 
 
 
 
f7092b9
eb75edf
 
 
 
 
 
93e0bdb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
license: apache-2.0
---

<img src="https://huggingface.co/datasets/hkust-nlp/deita-images/resolve/main/logo-final.png" alt="Deita banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>

# Model Card for Deita Complexity Scorer

[GitHub](https://github.com/hkust-nlp/deita) | [Paper](https://arxiv.org/abs/2312.15685)

Deita is an open-sourced project designed to facilitate **Automatic Data Selection** for instruction tuning in Large Language Models (LLMs).

Deita Complexity Scorer is a tool for automatically annotating the Instruction Complexity of SFT data.

## Model description

- **Model type:** Model fine tuned to automatically annotate the Instruction Complexity
- **Language(s) (NLP):** Primarily English
- **Finetuned from model:** Llama-1-13b-hf

### Model Sources

- **Repository:** https://github.com/hkust-nlp/deita
- **Model Family:** Other models and the dataset are found in the [Deita collection](https://huggingface.co/collections/hkust-nlp/deita-6569c198c174808d94cf5bd4).

## Performance




| Model                                          | Align     | Data Size  | MT-Bench | AlpacaEval(%) | OpenLLM (Avg.) |
|------------------------------------------------|-----------|------------|----------|---------------|----------------|
| **Proprietary Models**                         |           |            |          |               |                |
| GPT-4-Turbo                                    | ?         | --         | 9.32     | 97.70         | --             |
| GPT-4                                          | SFT + PPO | --         | 8.99     | 95.03         | --             |
| Claude-2                                       | SFT + PPO | --         | 8.06     | 91.36         | --             |
| GPT-3.5-turbo                                  | SFT + PPO | --         | 7.94     | 89.37         | --             |
| **Open-sourced Models based on LLaMA-1-13B**   |           |            |          |               |                |
| LIMA                                           | SFT       | 1K SFT        | 4.29     | 41.98         | 59.82          |
| WizardLM-13B                                   | SFT       | 70K SFT       | 6.35     | 75.31         | 58.96          |
| Vicuna-13B-v1.3                                | SFT       | 125K SFT      | 6.39     | 82.11         | 60.01          |
| Random                                         | SFT       | 10K SFT       | 6.03     | 71.52         | 60.14          |
| DEITA-LLaMA1-13B-v1.0-sft                           | SFT       | 10K SFT       | 6.60     | 78.01         | 64.27          |
| **Open-sourced Models based on LLaMA-2-13B**   |           |            |          |               |                |
| Tulu-2-13B                                     | SFT       | 326K SFT      | 6.70     | 78.90         | --             |
| Tulu-2-13B+DPO                                 | SFT + DPO | 326K SFT + 60K DPO | 7.00     | 89.50         | --             |
| LLaMA2-13B-Chat                                | SFT + PPO | --         | 6.65     | 81.09         | --             |
| WizardLM-13B-v1.2                              | SFT          | >70K SFT      | 7.09     | 89.17         | --             |
| Vicuna-13B-v1.5                                | SFT       | 125K SFT      | 6.57     | 78.80         | 61.63          |
| Random                                         | SFT       | 10K SFT       | 5.78     | 65.19         | 61.32          |
| DEITA-LLaMA2-13B-v1.0-sft                           | SFT       | 10K SFT       | 6.79     | 81.09         | 62.71          |
| **Open-sourced Models based on Mistral-7B**    |           |            |          |               |                |
| Mistral-7B-Instruct-v0.1                       | --        | --         | 6.84     | 69.65         | 60.45          |
| Zephyr-7B-sft                                  | SFT       | 200K SFT      | 5.32     | 75.12         | 60.93          |
| $\text{Zephyr-7B-}\beta$                       | SFT + DPO | 200K SFT + 60K DPO | 7.34     | 90.60         | 66.36          |
| OpenChat-3.5                                   | C-RLFT | >> 70K C-RLFT | 7.81     | 88.51         | --           |
| Starling-7B                                    | C-RLFT + APA | >>70K C-RLFT + 183K APA | 8.09     | 91.99         | --            |
| Random                                         | SFT       | 10K SFT       | 5.89     | 56.90         | 61.72          |
| DEITA-7B-v1.0-sft (6K)                           | SFT       | 6K SFT       | 7.22     | 80.78         | 64.94          |
| DEITA-7B-v1.0-sft (10K)                  | SFT       | 10K SFT       | 7.32     | 81.67         | 64.00          |
| DEITA-7B-v1.0             | SFT + DPO | 6K SFT + 10K DPO   | 7.55     | 90.06         | 69.86          |






## Usage

Please use the following format to score the complexity of the Instruction:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import numpy as np
from scipy.special import softmax
model_name = "hkust-nlp/deita-complexity-scorer"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)


def infer_complexity(model, tokenizer, input_text):
	complexity_template = ("You are a helpful assistant. Please identify the complexity score of the following user query. \n##Query: {instruction}  \n##Complexity: ")
	user_input = complexity_template.format(instruction=input_text)
	input_ids = tokenizer.encode(user_input, return_tensors="pt")
	max_length = 512
	outputs = model.generate(input_ids, max_length=512, num_return_sequences=1, return_dict_in_generate=True, output_scores=True)
	logprobs_list = outputs.scores[0][0]
	score_logits = []
	id2score = {
        29896: "1",
        29906: "2",
        29941: "3",
        29946: "4",
        29945: "5",
        29953: "6"
    }
	score_template = np.array([1,2,3,4,5,6])
	for k in id2score:
	    score_logits.append(logprobs_list[k])
	score_logits = np.array(score_logits)
	score_npy = softmax(score_logits, axis=0)
	score_npy = score_npy * score_template

	score_npy = np.sum(score_npy, axis=0)
	return score_npy

# example input
input_text = "write a performance review for a junior data scientist"
complexity_score = infer_complexity(model, tokenizer, input_text)

print(complexity_score)


```



## Citation
If you find the content of this project helpful, please cite our paper as follows:

```
@misc{liu2023what,
      title={What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning}, 
      author={Wei Liu and Weihao Zeng and Keqing He and Yong Jiang and Junxian He},
      year={2023},
      eprint={2312.15685},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```