Model Card for STEP

This model is pre-trained to perform (random) syntactic transformations of English sentences. The prefix given to the model decides, which syntactic transformation to apply.

See Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations for full details.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Matthias Lindemann
Funded by [optional]: UKRI, Huawei, Dutch National Science Foundation
Model type: Sequence-to-Sequence model
Language(s) (NLP): English
License: [More Information Needed]
Finetuned from model: T5-Base

Model Sources [optional]

Repository: https://github.com/namednil/step
Paper: Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations

Uses

Syntax-sensitive sequence-to-sequence for English such as passivization, semantic parsing, question formation, ...

Direct Use

This model needs to be fine-tuned as it implements random syntactic transformations.

Bias, Risks, and Limitations

The model was exposed to the C4 corpus (pre-training data of T5) and is based on T5 and hence likely inherits biases from that.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Model Examination [optional]

We identified the following interpretable transformation look-up heads (see paper for details) for UD relations (in the format (layer, head) both with 0-based indexing):

{'cop': [(0, 3), (4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)],
 'expl': [(0, 7), (7, 11), (8, 2), (8, 11), (9, 6), (9, 7), (11, 11)],
 'amod': [(4, 6), (6, 6), (7, 11), (8, 0), (8, 11), (9, 5), (11, 11)],
 'compound': [(4, 6), (6, 6), (7, 6), (7, 11), (8, 11), (9, 5), (9, 7), (9, 11), (11, 11)],
 'det': [(4, 6), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5)],
 'nmod:poss': [(4, 6), (4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (11, 11)],
 'advmod': [(4, 11), (6, 6), (7, 11), (8, 11), (9, 5), (9, 6), (9, 11), (11, 11)],
 'aux': [(4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)],
 'mark': [(4, 11), (8, 11), (9, 5), (9, 6), (11, 11)],
 'fixed': [(5, 5), (8, 2), (8, 6), (9, 4), (9, 6), (10, 1), (10, 4), (10, 6), (10, 11), (11, 11)],
 'compound:prt': [(6, 2), (6, 6), (7, 11), (8, 2), (8, 6), (9, 4), (9, 6), (10, 4), (10, 6),
                  (10, 11), (11, 11)],
 'acl': [(6, 6), (7, 11), (8, 2), (9, 4), (10, 6), (10, 11), (11, 11)],
 'nummod': [(6, 6), (7, 11), (8, 11), (9, 6), (11, 11)],
 'flat': [(6, 11), (7, 11), (8, 2), (8, 11), (9, 4), (10, 6), (10, 11), (11, 11)],
 'aux:pass': [(7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)],
 'iobj': [(7, 11), (10, 4), (10, 11)],
 'nsubj': [(7, 11), (8, 11), (9, 5), (9, 6), (9, 11), (11, 11)],
 'obj': [(7, 11), (10, 4), (10, 6), (10, 11), (11, 11)],
 'obl:tmod': [(7, 11), (9, 4), (10, 4), (10, 6), (11, 11)], 'case': [(8, 11), (9, 5)],
 'cc': [(8, 11), (9, 5), (9, 6), (11, 11)],
 'obl:npmod': [(8, 11), (9, 6), (9, 11), (10, 6), (11, 11)],
 'punct': [(8, 11), (9, 6), (10, 6), (10, 11), (11, 5)], 'csubj': [(9, 11), (10, 6), (11, 11)],
 'nsubj:pass': [(9, 11), (10, 6), (11, 11)], 'obl': [(9, 11), (10, 6)], 'acl:relcl': [(10, 6)],
 'advcl': [(10, 6), (11, 11)], 'appos': [(10, 6), (10, 11), (11, 11)], 'ccomp': [(10, 6)],
 'conj': [(10, 6)], 'nmod': [(10, 6), (10, 11)], 'vocative': [(10, 6)],
 'xcomp': [(10, 6), (10, 11)]}

Environmental Impact

Hardware Type: Nvidia 2080 TI
Hours used: 30

Technical Specifications

Model Architecture and Objective

T5-Base, 12 layers, hidden dimensionality of 768.

Citation

BibTeX:

@misc{lindemann2024strengtheningstructuralinductivebiases,
      title={Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations}, 
      author={Matthias Lindemann and Alexander Koller and Ivan Titov},
      year={2024},
      eprint={2407.04543},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.04543}, 
}