Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- bigscience/P3
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
---
|
7 |
+
|
8 |
+
A 3B T5 model trained on the [P3](https://huggingface.co/datasets/bigscience/P3) (T0 split) dataset for 20,000 steps with a batch size of 2048 a maximum input sequence length of 1024, a maximum output sequence length of 256, and the Adafactor optimizer with a constant learning rate of 0.001.
|
9 |
+
The model is trained from the [T5 v1.1 lm-adapt checkpoint](https://huggingface.co/google/t5-xl-lm-adapt) and fully finetuned.
|
10 |
+
|
11 |
+
For more details, see [HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation](https://arxiv.org/abs/2212.10315).
|
12 |
+
|
13 |
+
Performance on T0 held-out tasks (average accuracy across prompts using rank classification):
|
14 |
+
|
15 |
+
| Model | ANLI (avg) | HellaSwag | StoryCloze | CB | COPA | RTE | WiC | WSC | WinoGrande | Average |
|
16 |
+
|--|--|--|--|--|--|--|--|--|--|--|
|
17 |
+
| [T0-3B](https://huggingface.co/bigscience/T0_3B) | 33.4 | 27.2 | 84.0 | 45.4 | 75.9 | 64.6 | 50.7 | 65.1 | 51.0 | 55.2 |
|
18 |
+
| hypertask_T0_3B (this model) | 41.7 | 30.1 | 96.9 | 72.7 | 89.1 | 81.2 | 51.7 | 57.2 | 59.2 | 64.4 |
|