Update README.md
Browse files
README.md
CHANGED
@@ -12,14 +12,14 @@ tags:
|
|
12 |
# Erlangshen-RoBERTa-330M-UniMC-Chinese
|
13 |
|
14 |
- Paper: [Zero-Shot Learners for Nature Language Understanding via a Unified Multiple Choice Perspective](https://github.com/IDEA-CCNL/Fengshenbang-LM)
|
15 |
-
- Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)
|
16 |
- Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/)
|
17 |
|
18 |
## 简介 Brief Introduction
|
19 |
|
20 |
-
将自然语言理解任务转化为multiple choice任务,并且使用
|
21 |
|
22 |
-
Convert natural language understanding tasks into multiple choice tasks, and use
|
23 |
|
24 |
## 模型分类 Model Taxonomy
|
25 |
|
@@ -37,16 +37,41 @@ avoiding problems in commonly used large generative models such as FLAN. It not
|
|
37 |
|
38 |
### 下游效果 Performance
|
39 |
|
40 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
-
| Model | T0 11B | GLaM 60B | FLAN 137B | PaLM 540B | UniMC 235M |
|
43 |
-
|---------|--------|----------|-----------|-----------|------------|
|
44 |
-
| ANLI R1 | 43.6 | 40.9 | 47.7 | 48.4 | 52.0 |
|
45 |
-
| ANLI R2 | 38.7 | 38.2 | 43.9 | 44.2 | 44.4 |
|
46 |
-
| ANLI R3 | 41.3 | 40.9 | 47.0 | 45.7 | 47.8 |
|
47 |
-
| CB | 70.1 | 33.9 | 64.1 | 51.8 | 75.7 |
|
48 |
|
49 |
## 使用 Usage
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
```python3
|
52 |
import argparse
|
@@ -56,7 +81,13 @@ from fengshen import UniMCPiplines
|
|
56 |
total_parser = argparse.ArgumentParser("TASK NAME")
|
57 |
total_parser = UniMCPiplines.piplines_args(total_parser)
|
58 |
args = total_parser.parse_args()
|
59 |
-
args.pretrained_model_path = 'IDEA-CCNL/Erlangshen-RoBERTa-
|
|
|
|
|
|
|
|
|
|
|
|
|
60 |
|
61 |
train_data = []
|
62 |
dev_data = []
|
@@ -75,9 +106,6 @@ test_data = [
|
|
75 |
"id": 7759}
|
76 |
]
|
77 |
|
78 |
-
|
79 |
-
model = UniMCPiplines(args)
|
80 |
-
|
81 |
if args.train:
|
82 |
model.fit(train_data, dev_data)
|
83 |
result = model.predict(test_data)
|
|
|
12 |
# Erlangshen-RoBERTa-330M-UniMC-Chinese
|
13 |
|
14 |
- Paper: [Zero-Shot Learners for Nature Language Understanding via a Unified Multiple Choice Perspective](https://github.com/IDEA-CCNL/Fengshenbang-LM)
|
15 |
+
- Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/unimc/)
|
16 |
- Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/)
|
17 |
|
18 |
## 简介 Brief Introduction
|
19 |
|
20 |
+
将自然语言理解任务转化为multiple choice任务,并且使用 48 个 NLU 任务进行预训练
|
21 |
|
22 |
+
Convert natural language understanding tasks into multiple choice tasks, and use 48 NLU task for pre-training
|
23 |
|
24 |
## 模型分类 Model Taxonomy
|
25 |
|
|
|
37 |
|
38 |
### 下游效果 Performance
|
39 |
|
40 |
+
**Few-shot**
|
41 |
+
| Model | eprstmt | csldcp | tnews | iflytek | ocnli | bustm | chid | csl | wsc | Avg |
|
42 |
+
|------------|------------|----------|-----------|----------|-----------|-----------|-----------|----------|-----------|-----------|
|
43 |
+
| Finetuning | 65.4 | 35.5 | 49 | 32.8 | 33 | 60.7 | 14.9 | 50 | 55.6 | 44.1 |
|
44 |
+
| PET | 86.7 | 51.7 | 54.5 | 46 | 44 | 56 | 61.2 | 59.4 | 57.5 | 57.44 |
|
45 |
+
| LM-BFF | 85.6 | 54.4 | 53 | 47.1 | 41.6 | 57.6 | 61.2 | 51.7 | 54.7 | 56.32 |
|
46 |
+
| P-tuning | 88.3 | 56 | 54.2 | **57.6** | 41.9 | 60.9 | 59.3 | **62.9** | 58.1 | 59.91 |
|
47 |
+
| EFL | 84.9 | 45 | 52.1 | 42.7 | 66.2 | 71.8 | 30.9 | 56.6 | 53 | 55.91 |
|
48 |
+
| [UniMC-110M](https://huggingface.co/IDEA-CCNL/Erlangshen-RoBERTa-110M-UniMC-Chinese) | 88.64 | 54.08 | 54.32 | 48.6 | 66.55 | 73.76 | 67.71 | 52.54 | 59.92 | 62.86 |
|
49 |
+
| [UniMC-330M](https://huggingface.co/IDEA-CCNL/Erlangshen-RoBERTa-330M-UniMC-Chinese) | 89.53 | 57.3 | 54.25 | 50 | 70.59 | 77.49 | 78.09 | 55.73 | 65.16 | 66.46 |
|
50 |
+
| [UniMC-1.3B](https://huggingface.co/IDEA-CCNL/Erlangshen-MegatronBERT-1.3B-UniMC-Chinese) | **89.278** | **60.9** | **57.46** | 52.89 | **76.33** | **80.37** | **90.33** | 61.73 | **79.15** | **72.05** |
|
51 |
+
|
52 |
+
**Zero-shot**
|
53 |
+
|
54 |
+
| Model | eprstmt | csldcp | tnews | iflytek | ocnli | bustm | chid | csl | wsc | Avg |
|
55 |
+
|---------------|-----------|-----------|-----------|-----------|-----------|----------|----------|----------|-----------|-----------|
|
56 |
+
| GPT-zero | 57.5 | 26.2 | 37 | 19 | 34.4 | 50 | 65.6 | 50.1 | 50.3 | 43.4 |
|
57 |
+
| PET-zero | 85.2 | 12.6 | 26.1 | 26.6 | 40.3 | 50.6 | 57.6 | 52.2 | 54.7 | 45.1 |
|
58 |
+
| NSP-BERT | 86.9 | 47.6 | 51 | 41.6 | 37.4 | 63.4 | 52 | **64.4** | 59.4 | 55.96 |
|
59 |
+
| ZeroPrompt | - | - | - | 16.14 | 46.16 | - | - | - | 47.98 | - |
|
60 |
+
| Yuan1.0-13B | 88.13 | 38.99 | 57.47 | 38.82 | 48.13 | 59.38 | 86.14 | 50 | 38.99 | 56.22 |
|
61 |
+
| ERNIE3.0-240B | 88.75 | **50.97** | **57.83** | **40.42** | 53.57 | 64.38 | 87.13 | 56.25 | 53.46 | 61.41 |
|
62 |
+
| [UniMC-110M](https://huggingface.co/IDEA-CCNL/Erlangshen-RoBERTa-110M-UniMC-Chinese) | 86.16 | 31.26 | 46.61 | 26.54 | 66.91 | 73.34 | 66.68 | 50.09 | 53.66 | 55.7 |
|
63 |
+
| [UniMC-330M](https://huggingface.co/IDEA-CCNL/Erlangshen-RoBERTa-330M-UniMC-Chinese) | 87.5 | 30.4 | 47.6 | 31.5 | 69.9 | 75.9 | 78.17 | 49.5 | 60.55 | 59.01 |
|
64 |
+
| [UniMC-1.3B](https://huggingface.co/IDEA-CCNL/Erlangshen-MegatronBERT-1.3B-UniMC-Chinese) | **88.79** | 42.06 | 55.21 | 33.93 | **75.57** | **79.5** | **89.4** | 50.25 | **66.67** | **64.53** |
|
65 |
+
|
66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
## 使用 Usage
|
69 |
+
```shell
|
70 |
+
git clone https://github.com/IDEA-CCNL/Fengshenbang-LM.git
|
71 |
+
cd Fengshenbang-LM
|
72 |
+
pip install --editable .
|
73 |
+
```
|
74 |
+
|
75 |
|
76 |
```python3
|
77 |
import argparse
|
|
|
81 |
total_parser = argparse.ArgumentParser("TASK NAME")
|
82 |
total_parser = UniMCPiplines.piplines_args(total_parser)
|
83 |
args = total_parser.parse_args()
|
84 |
+
args.pretrained_model_path = 'IDEA-CCNL/Erlangshen-RoBERTa-110M-UniMC-Chinese'
|
85 |
+
args.learning_rate=2e-5
|
86 |
+
args.max_length=512
|
87 |
+
args.max_epochs=3
|
88 |
+
args.batchsize=8
|
89 |
+
args.default_root_dir='./'
|
90 |
+
model = UniMCPiplines(args)
|
91 |
|
92 |
train_data = []
|
93 |
dev_data = []
|
|
|
106 |
"id": 7759}
|
107 |
]
|
108 |
|
|
|
|
|
|
|
109 |
if args.train:
|
110 |
model.fit(train_data, dev_data)
|
111 |
result = model.predict(test_data)
|