Bohr commited on
Commit
de43994
Β·
verified Β·
1 Parent(s): 69f6e29

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## πŸ“– Introduction
2
+
3
+ **Instruction-Tagger** is a powerful model for labeling instructions with task tags. It allows users to easily adjust the proportion of tasks in a dataset.
4
+
5
+ #### Example Input
6
+
7
+ >What are the main differences between Type 1 and Type 2 diabetes, and how do their treatment approaches differ?"
8
+
9
+ #### Example Output
10
+ >Medicine
11
+
12
+
13
+ ## πŸš€ Quick Start
14
+
15
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
16
+
17
+ ```python
18
+ import torch
19
+ from transformers import DebertaV2Tokenizer,DebertaV2ForSequenceClassification, Trainer, TrainingArguments
20
+
21
+ model = DebertaV2ForSequenceClassification.from_pretrained('deberta_cls', num_labels=33).cuda()
22
+ tokenizer = DebertaV2Tokenizer.from_pretrained('alibaba-pai/Instruction-Tagger')
23
+
24
+ labels={14: 'Writting',
25
+ 0: 'Common-Sense',
26
+ 28: 'Ecology',
27
+ 22: 'Medicine',
28
+ 17: 'Grammar',
29
+ 3: 'Code Generation',
30
+ 31: 'Others',
31
+ 20: 'Paraphrase',
32
+ 19: 'Economy',
33
+ 6: 'Code Debug',
34
+ 21: 'Reasoning',
35
+ 18: 'Computer Science',
36
+ 4: 'Technology',
37
+ 13: 'Math',
38
+ 32: 'Literature',
39
+ 26: 'Chemistry',
40
+ 15: 'Complex Format',
41
+ 25: 'Ethics',
42
+ 27: 'Multilingual',
43
+ 29: 'Roleplay',
44
+ 30: 'Entertainment',
45
+ 23: 'Biology',
46
+ 16: 'Art',
47
+ 10: 'Academic Writing',
48
+ 24: 'Health',
49
+ 11: 'Philosophy',
50
+ 5: 'Sport',
51
+ 1: 'History',
52
+ 12: 'Music',
53
+ 7: 'Toxicity',
54
+ 2: 'Law',
55
+ 9: 'Physics',
56
+ 8: 'Counterfactual'}
57
+
58
+ def task_cls(pp):
59
+ inputs = tokenizer(pp, return_tensors="pt",padding=True).to("cuda")
60
+
61
+ with torch.no_grad():
62
+ logits = model(**inputs).logits
63
+
64
+ predicted_class_id = logits.argmax().item()
65
+
66
+ return labels[predicted_class_id]
67
+
68
+ instruct="""
69
+ What are the main differences between Type 1 and Type 2 diabetes, and how do their treatment approaches differ?"
70
+ """
71
+
72
+ tag=task_cls(instruct)
73
+ ```
74
+
75
+ ## πŸ” Evaluation
76
+
77
+ To assess the accuracy of task classification, we manually evaluate a sample set of 100 entries (not in the training set), resulting in a classification precision of 92%.
78
+
79
+ ## πŸ“œ Citation
80
+
81
+ If you find our work helpful, please cite it!
82
+
83
+ ```
84
+ @misc{TAPIR,
85
+ title={Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning},
86
+ author={Yuanhao Yue and Chengyu Wang and Jun Huang and Peng Wang},
87
+ year={2024},
88
+ eprint={2405.13448},
89
+ archivePrefix={arXiv},
90
+ primaryClass={cs.CL},
91
+ url={https://arxiv.org/abs/2405.13448},
92
+ }
93
+ ```