ltg
/

FLAN-T5-Definition Base

This model is a version of FLAN-T5 Base finetuned on a dataset of English definitions and usage examples.

It generates definitions of English words in context. Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?"

This project is a collaboration between the Dialogue Modelling Group at the University of Amsterdam and the Language Technology Group at the University of Oslo.

Sizes:

Model description

See details in the paper Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis (ACL'2023) by Mario Giulianelli, Iris Luden, Raquel Fernandez and Andrey Kutuzov.

Intended uses & limitations

The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.

The fine-tuning datasets were limited to English. Although the original FLAN-T5 is a multilingual model, we did not thoroughly evaluate its ability to generate definitions in languages other than English.

Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.

Training and evaluation data

Three datasets were used to fine-tune the model:

FLAN-T5-Definition Base achieves the following results on the WordNet test set:

  • BLEU: 10.38
  • ROUGE-L: 27.17
  • BERT-F1: 88.22

FLAN-T5-Definition Base achieves the following results on the Oxford dictionary test set:

  • BLEU: 7.18
  • ROUGE-L: 23.04
  • BERT-F1: 86.90

Training procedure

FLAN-T5 Base was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 15.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
2.5645 1.0 2740 2.2535 24.4437 6.4189 22.7949 22.7909 11.4969
2.3501 2.0 5480 2.1642 25.6642 7.289 23.8689 23.8749 11.7150
2.2516 3.0 8220 2.1116 26.4562 7.8955 24.6275 24.6376 11.7441
2.1806 4.0 10960 2.0737 27.0392 8.2393 25.1555 25.1641 11.7930
2.1233 5.0 13700 2.0460 27.2709 8.4244 25.3847 25.4003 11.9014
2.0765 6.0 16440 2.0236 27.5456 8.6096 25.6321 25.6462 11.8113
2.0371 7.0 19180 2.0047 27.7209 8.7277 25.7871 25.8084 11.6875
2.0036 8.0 21920 1.9918 28.0431 8.9863 26.1072 26.1198 11.5487
1.9752 9.0 24660 1.9788 28.1807 9.0219 26.1692 26.1886 11.7939
1.9513 10.0 27400 1.9702 28.3204 9.1572 26.2955 26.3029 11.5936
1.9309 11.0 30140 1.9640 28.4289 9.2845 26.4006 26.418 11.8371
1.9144 12.0 32880 1.9571 28.4504 9.3406 26.4273 26.4384 11.6201
1.9013 13.0 35620 1.9544 28.6319 9.3682 26.605 26.613 11.7067
1.8914 14.0 38360 1.9512 28.6435 9.3976 26.5839 26.5918 11.7307
1.8866 15.0 41100 1.9509 28.6111 9.3857 26.551 26.5648 11.7470

Framework versions

  • Transformers 4.24.0
  • Pytorch 1.11.0
  • Datasets 2.3.2
  • Tokenizers 0.12.1

Citation

@inproceedings{giulianelli-etal-2023-interpretable,
    title = "Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis",
    author = "Giulianelli, Mario  and
      Luden, Iris  and
      Fernandez, Raquel  and
      Kutuzov, Andrey",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.176",
    doi = "10.18653/v1/2023.acl-long.176",
    pages = "3130--3148",
    abstract = "We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations.Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users {---} historical linguists, lexicographers, or social scientists {---} to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the {`}definitions as representations{'} paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP.",
}
Downloads last month
245
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ltg/flan-t5-definition-en-base

Collection including ltg/flan-t5-definition-en-base