emnlp 2023
commited on
Commit
·
151eadf
1
Parent(s):
c2dba12
Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ metrics:
|
|
10 |
- exact_match
|
11 |
- rouge
|
12 |
model-index:
|
13 |
-
- name:
|
14 |
results:
|
15 |
- task:
|
16 |
type: question-answering
|
@@ -29,7 +29,7 @@ language:
|
|
29 |
- en
|
30 |
---
|
31 |
|
32 |
-
# Model Card for
|
33 |
|
34 |
<!-- Provide a quick summary of what the model is/does. -->
|
35 |
|
@@ -61,7 +61,7 @@ which is subsequently served by extending model's decoder input context by addin
|
|
61 |
|
62 |
<!-- Provide the basic links for the model. -->
|
63 |
|
64 |
-
- **Repository:** https://github.com/
|
65 |
- **Paper:** Stay tuned!
|
66 |
|
67 |
## Usage
|
@@ -70,8 +70,8 @@ Additionally to conventional generation, using Tool-augmented generation require
|
|
70 |
(1) implementation of the tool(s) and
|
71 |
(2) a customization of generate() method augmenting input context on-demand with the outputs of the tools.
|
72 |
|
73 |
-
You can find these two components implemented in the attached **gadget_assisted_model.py** and **gadget.py** in this model's repo
|
74 |
-
and the project's [home repo](https://github.com/
|
75 |
|
76 |
After adding these two scripts to your directory, you can use the model as follows:
|
77 |
|
@@ -87,8 +87,8 @@ class GadgetAssistedT5(GadgetAssistedModel, T5ForConditionalGeneration):
|
|
87 |
pass
|
88 |
|
89 |
|
90 |
-
model = GadgetAssistedT5.from_pretrained("emnlp2023/calc-
|
91 |
-
tokenizer = T5Tokenizer.from_pretrained("emnlp2023/calc-
|
92 |
|
93 |
model.prepare_for_generate(tokenizer,
|
94 |
enabled_gadgets=[Calculator()],
|
@@ -118,24 +118,16 @@ Final result is<result>800</result></s>
|
|
118 |
Note that given the limited scope of the exercises' complexity in the training, this model will not work well for tasks requiring
|
119 |
more complex algebraic operations, including equations, variables and operations outside the scope of (+-*/).
|
120 |
|
|
|
121 |
## Training Details
|
122 |
|
123 |
### Training Data
|
124 |
-
|
125 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|
|
126 |
|
127 |
-
|
128 |
-
[
|
129 |
-
[
|
130 |
-
[
|
|
|
131 |
in a standard auto-regressive setup i.e. for a conditional next-token prediction with teacher-forced prefix.
|
132 |
-
|
133 |
-
### Training Procedure
|
134 |
-
|
135 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
136 |
-
|
137 |
-
The model was fine-tuned from [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl) for TODO steps
|
138 |
-
aiming to maximise exact-match ration on a validation split of the questions from [gsm8k dataset](https://huggingface.co/datasets/gsm8k).
|
139 |
-
We fine-tune only TODO of the parameters finding that this circumvents overfitting to relatively small training dataset.
|
140 |
-
|
141 |
-
The full training configuration can be identified from the [training script](https://github.com/emnlp2023/gadgets/blob/9185d1fc4b4812321179f8e5cad3e2f2a764f1df/examples/train_gsm8k_flan-t5-slice.py).
|
|
|
10 |
- exact_match
|
11 |
- rouge
|
12 |
model-index:
|
13 |
+
- name: calc-flan-xl
|
14 |
results:
|
15 |
- task:
|
16 |
type: question-answering
|
|
|
29 |
- en
|
30 |
---
|
31 |
|
32 |
+
# Model Card for calc-flan-xl
|
33 |
|
34 |
<!-- Provide a quick summary of what the model is/does. -->
|
35 |
|
|
|
61 |
|
62 |
<!-- Provide the basic links for the model. -->
|
63 |
|
64 |
+
- **Repository:** https://github.com/emnlp2023sub/gadgets
|
65 |
- **Paper:** Stay tuned!
|
66 |
|
67 |
## Usage
|
|
|
70 |
(1) implementation of the tool(s) and
|
71 |
(2) a customization of generate() method augmenting input context on-demand with the outputs of the tools.
|
72 |
|
73 |
+
You can find these two components implemented in the attached **gadgets/gadget_assisted_model.py** and **gadgets/gadget.py** in this model's repo
|
74 |
+
and the project's [home repo](https://github.com/emnlp2023sub/gadgets).
|
75 |
|
76 |
After adding these two scripts to your directory, you can use the model as follows:
|
77 |
|
|
|
87 |
pass
|
88 |
|
89 |
|
90 |
+
model = GadgetAssistedT5.from_pretrained("emnlp2023/calc-flan-xl")
|
91 |
+
tokenizer = T5Tokenizer.from_pretrained("emnlp2023/calc-flan-xl")
|
92 |
|
93 |
model.prepare_for_generate(tokenizer,
|
94 |
enabled_gadgets=[Calculator()],
|
|
|
118 |
Note that given the limited scope of the exercises' complexity in the training, this model will not work well for tasks requiring
|
119 |
more complex algebraic operations, including equations, variables and operations outside the scope of (+-*/).
|
120 |
|
121 |
+
|
122 |
## Training Details
|
123 |
|
124 |
### Training Data
|
|
|
125 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
126 |
+
This model was trained on our Calculator-augmented set of
|
127 |
|
128 |
+
- [Calc Ape210k](https://huggingface.co/datasets/emnlp2023/Calc-ape210k) ([original Ape210k on github](https://github.com/Chenny0808/ape210k))
|
129 |
+
- [Calc MathQA](https://huggingface.co/datasets/emnlp2023/Calc-math_qa) ([original MathQA on HF](https://huggingface.co/datasets/math_qa))
|
130 |
+
- [Calc GSM8K](https://huggingface.co/datasets/emnlp2023/Calc-gsm8k) ([original GSM8K on HF](https://huggingface.co/datasets/gsm8k))
|
131 |
+
- [Calc Aqua-RAT](https://huggingface.co/datasets/emnlp2023/Calc-aqua_rat) ([original Aqua-RAT on HF](https://huggingface.co/datasets/aqua_rat))
|
132 |
+
|
133 |
in a standard auto-regressive setup i.e. for a conditional next-token prediction with teacher-forced prefix.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|