Bohr commited on
Commit
747e42f
ยท
verified ยท
1 Parent(s): fc5bf44

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## ๐Ÿ“– Introduction
2
+
3
+ **DistilQwen2-7B** is a distilled version of **Qwen2-7B-Instruct**, designed to distill the capabilities of stronger LLMs into smaller ones. To achieve this, we utilized a diverse range of datasets for the distillation process, including well-known open-source collections such as Magpie, Openhermes, and Mammoth 2, as well as proprietary synthetic datasets.
4
+
5
+ The training data primarily consists of instructions in Chinese and English. To enhance the quality and diversity of the instruction data, we implemented a difficulty scoring system and task-related resampling techniques.
6
+
7
+ For difficulty scoring, we employed the LLM-as-a-Judge paradigm, using the teacher model to evaluate responses based on accuracy, relevance, helpfulness, and level of detail. We then calculated the Model Fitting Difficulty (MFD) Score by subtracting the teacher model's score from the student model's score. A higher MFD Score indicates that the instruction is more valuable for distillation training. This approach allowed us to remove low-difficulty instructions from the training set, focusing on more challenging and informative examples.
8
+
9
+ This careful curation and scoring process ensures that **DistilQwen2-7B** achieves high performance after the distillation process.
10
+
11
+ ## ๐Ÿš€ Quick Start
12
+
13
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
14
+
15
+ ```python
16
+ from transformers import AutoModelForCausalLM, AutoTokenizer
17
+ device = "cuda" # the device to load the model onto
18
+
19
+ model = AutoModelForCausalLM.from_pretrained(
20
+ "alibaba-pai/DistilQwen2-7B-Instruct",
21
+ torch_dtype="auto",
22
+ device_map="auto"
23
+ )
24
+ tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/DistilQwen2-7B-Instruct")
25
+
26
+ prompt = "Give me a short introduction to large language model."
27
+ messages = [
28
+ {"role": "user", "content": prompt}
29
+ ]
30
+ text = tokenizer.apply_chat_template(
31
+ messages,
32
+ tokenize=False,
33
+ add_generation_prompt=True
34
+ )
35
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
36
+
37
+ generated_ids = model.generate(
38
+ model_inputs.input_ids,
39
+ max_new_tokens=2048๏ผŒ
40
+ )
41
+ generated_ids = [
42
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
43
+ ]
44
+
45
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
46
+ ```
47
+
48
+ ## ๐Ÿ” Evaluation
49
+
50
+ We used single-turn instructions from MT-Bench as input for Qwen2-1.5B-Instruct and Qwen2-7B-Instruct. GPT4-turbo is used to evaluate the changes in the level of detail and truthfulness of responses to our model's revised instructions.
51
+
52
+
53
+ | Model | AlpacaEval 2.0 (length-controlled) | MT-Bench | MT-Bench (single) | IFEval (instruction-loose) | IFEval (strict-prompt) |
54
+ |------|-----------------------------------|----------|-------------------|---------------------------|------------------------|
55
+ | Qwen2-1.5B-Instruct | 5.22 | 5.85 | 6.45 | 41.37 | 28.10 |
56
+ | DistilQwen2-1.5B-Instruct | 8.28 | 6.42 | 7.12 | 49.76 | 36.04 |
57
+ | Qwen2-7B-Instruct | 24.33 | 8.27 | 8.68 | 66.67 | 52.31 |
58
+ | DistilQwen2-7B-Instruct | 25.35 | 8.40 | 9.03 | 71.46 | 60.26 |
59
+
60
+
61
+ ## ๐Ÿ“œ Citation
62
+
63
+ If you find our work helpful, please cite it!
64
+
65
+ ```
66
+ @misc{TAPIR,
67
+ title={Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning},
68
+ author={Yuanhao Yue and Chengyu Wang and Jun Huang and Peng Wang},
69
+ year={2024},
70
+ eprint={2405.13448},
71
+ archivePrefix={arXiv},
72
+ primaryClass={cs.CL},
73
+ url={https://arxiv.org/abs/2405.13448},
74
+ }
75
+ ```