Model Card for STEP

This model is pre-trained to perform (random) syntactic transformations of English sentences. The prefix given to the model decides, which syntactic transformation to apply.

See Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations for full details.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: Matthias Lindemann
  • Funded by [optional]: UKRI, Huawei, Dutch National Science Foundation
  • Model type: Sequence-to-Sequence model
  • Language(s) (NLP): English
  • License: [More Information Needed]
  • Finetuned from model: T5-Base

Model Sources [optional]

Uses

Syntax-sensitive sequence-to-sequence for English such as passivization, semantic parsing, question formation, ...

Direct Use

This model needs to be fine-tuned as it implements random syntactic transformations.

Bias, Risks, and Limitations

The model was exposed to the C4 corpus (pre-training data of T5) and is based on T5 and hence likely inherits biases from that.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Model Examination [optional]

We identified the following interpretable transformation look-up heads (see paper for details) for UD relations (in the format (layer, head) both with 0-based indexing):

{'cop': [(0, 3), (4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)],
 'expl': [(0, 7), (7, 11), (8, 2), (8, 11), (9, 6), (9, 7), (11, 11)],
 'amod': [(4, 6), (6, 6), (7, 11), (8, 0), (8, 11), (9, 5), (11, 11)],
 'compound': [(4, 6), (6, 6), (7, 6), (7, 11), (8, 11), (9, 5), (9, 7), (9, 11), (11, 11)],
 'det': [(4, 6), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5)],
 'nmod:poss': [(4, 6), (4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (11, 11)],
 'advmod': [(4, 11), (6, 6), (7, 11), (8, 11), (9, 5), (9, 6), (9, 11), (11, 11)],
 'aux': [(4, 11), (7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)],
 'mark': [(4, 11), (8, 11), (9, 5), (9, 6), (11, 11)],
 'fixed': [(5, 5), (8, 2), (8, 6), (9, 4), (9, 6), (10, 1), (10, 4), (10, 6), (10, 11), (11, 11)],
 'compound:prt': [(6, 2), (6, 6), (7, 11), (8, 2), (8, 6), (9, 4), (9, 6), (10, 4), (10, 6),
                  (10, 11), (11, 11)],
 'acl': [(6, 6), (7, 11), (8, 2), (9, 4), (10, 6), (10, 11), (11, 11)],
 'nummod': [(6, 6), (7, 11), (8, 11), (9, 6), (11, 11)],
 'flat': [(6, 11), (7, 11), (8, 2), (8, 11), (9, 4), (10, 6), (10, 11), (11, 11)],
 'aux:pass': [(7, 11), (8, 11), (9, 5), (9, 6), (10, 5), (11, 11)],
 'iobj': [(7, 11), (10, 4), (10, 11)],
 'nsubj': [(7, 11), (8, 11), (9, 5), (9, 6), (9, 11), (11, 11)],
 'obj': [(7, 11), (10, 4), (10, 6), (10, 11), (11, 11)],
 'obl:tmod': [(7, 11), (9, 4), (10, 4), (10, 6), (11, 11)], 'case': [(8, 11), (9, 5)],
 'cc': [(8, 11), (9, 5), (9, 6), (11, 11)],
 'obl:npmod': [(8, 11), (9, 6), (9, 11), (10, 6), (11, 11)],
 'punct': [(8, 11), (9, 6), (10, 6), (10, 11), (11, 5)], 'csubj': [(9, 11), (10, 6), (11, 11)],
 'nsubj:pass': [(9, 11), (10, 6), (11, 11)], 'obl': [(9, 11), (10, 6)], 'acl:relcl': [(10, 6)],
 'advcl': [(10, 6), (11, 11)], 'appos': [(10, 6), (10, 11), (11, 11)], 'ccomp': [(10, 6)],
 'conj': [(10, 6)], 'nmod': [(10, 6), (10, 11)], 'vocative': [(10, 6)],
 'xcomp': [(10, 6), (10, 11)]}

Environmental Impact

  • Hardware Type: Nvidia 2080 TI
  • Hours used: 30

Technical Specifications

Model Architecture and Objective

T5-Base, 12 layers, hidden dimensionality of 768.

Citation

BibTeX:

@misc{lindemann2024strengtheningstructuralinductivebiases,
      title={Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations}, 
      author={Matthias Lindemann and Alexander Koller and Ivan Titov},
      year={2024},
      eprint={2407.04543},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.04543}, 
}
Downloads last month
32
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.