Graded Word Sense Disambiguation (WSD) Model

Model Summary

This model is a fine-tuned version of RoBERTa-Large for Graded Word Sense Disambiguation (WSD). It is designed to predict the degree of applicability (1-4) of a word sense in context by leveraging large-scale sense-annotated corpora. The model is based on the work outlined in:

Reference Paper: Pierluigi Cassotti, Nina Tahmasebi (2025). Sense-specific Historical Word Usage Generation.

This model has been trained to handle graded WSD tasks, providing continuous-valued predictions instead of hard classification, making it useful for nuanced applications in lexicography, computational linguistics, and historical text analysis.


Model Details

  • Base Model: roberta-large
  • Task: Graded Word Sense Disambiguation (WSD)
  • Fine-tuning Dataset: Oxford English Dictionary (OED) sense-annotated corpus
  • Training Steps:
    • Tokenizer augmented with special tokens (<t>, </t>) for marking target words in context.
    • Dataset preprocessed with sense annotations and word offsets.
    • Sentences containing sense-annotated words were split into train (90%) and validation (10%) sets.
    • Objective: Predicting a continuous label representing the applicability of a sense.
    • Evaluation Metric: Root Mean Squared Error (RMSE).
  • Batch Size: 32
  • Learning Rate: 2e-5
  • Epochs: 1
  • Optimizer: AdamW with weight decay of 0.01
  • Evaluation Strategy: Steps-based (every 10% of the dataset).

Training & Fine-Tuning

Fine-tuning was performed using the Hugging Face Trainer API with a custom dataset loader. The dataset was processed as follows:

  1. Preprocessing

    • Example sentences were extracted from the OED and augmented with definitions.
    • The target word was highlighted with special tokens (<t>, </t>).
    • Each instance was labeled with a graded similarity score.
  2. Tokenization & Encoding

    • Tokenized with AutoTokenizer.from_pretrained("roberta-large").
    • Definitions were concatenated using the </s></s> separator for cross-sentence representation.
  3. Training Pipeline

    • Model fine-tuned on the regression task with a single linear output head.
    • Used Mean Squared Error (MSE) loss.
    • Evaluation on validation set using Root Mean Squared Error (RMSE).

Usage

Example Code

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("ChangeIsKey/graded-wsd")
model = AutoModelForSequenceClassification.from_pretrained("ChangeIsKey/graded-wsd")

sentence = "The <t>bank</t> of the river was eroding due to the storm."
target_word = "bank"
definition = "The land alongside a river or a stream."

tokenized_input = tokenizer(f"{sentence} </s></s> {definition}", truncation=True, padding=True, return_tensors="pt")
with torch.no_grad():
    output = model(**tokenized_input)
    score = output.logits.item()

print(f"Graded Sense Score: {score}")

Input Format

  • Sentence: Contextual usage of the word.
  • Target Word: The word to be disambiguated.
  • Definition: The dictionary definition of the intended sense.

Output

  • A continuous score (between 1 and 4) indicating the similarity of the given definition with respect to the word in its current context.

Citation

If you use this model, please cite the following paper:

@article{cassotti2025,
  title={Sense-specific Historical Word Usage Generation},
  author={Cassotti, Pierluigi and Tahmasebi, Nina},
  journal={TACL},
  year={2025}
}
Downloads last month
11
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for ChangeIsKey/graded-wsd

Finetuned
(330)
this model

Collection including ChangeIsKey/graded-wsd