Graded Word Sense Disambiguation (WSD) Model
Model Summary
This model is a fine-tuned version of RoBERTa-Large for Graded Word Sense Disambiguation (WSD). It is designed to predict the degree of applicability (1-4) of a word sense in context by leveraging large-scale sense-annotated corpora. The model is based on the work outlined in:
Reference Paper: Pierluigi Cassotti, Nina Tahmasebi (2025). Sense-specific Historical Word Usage Generation.
This model has been trained to handle graded WSD tasks, providing continuous-valued predictions instead of hard classification, making it useful for nuanced applications in lexicography, computational linguistics, and historical text analysis.
Model Details
- Base Model:
roberta-large
- Task: Graded Word Sense Disambiguation (WSD)
- Fine-tuning Dataset: Oxford English Dictionary (OED) sense-annotated corpus
- Training Steps:
- Tokenizer augmented with special tokens (
<t>
,</t>
) for marking target words in context. - Dataset preprocessed with sense annotations and word offsets.
- Sentences containing sense-annotated words were split into train (90%) and validation (10%) sets.
- Objective: Predicting a continuous label representing the applicability of a sense.
- Evaluation Metric: Root Mean Squared Error (RMSE).
- Tokenizer augmented with special tokens (
- Batch Size: 32
- Learning Rate: 2e-5
- Epochs: 1
- Optimizer: AdamW with weight decay of 0.01
- Evaluation Strategy: Steps-based (every 10% of the dataset).
Training & Fine-Tuning
Fine-tuning was performed using the Hugging Face Trainer
API with a custom dataset loader. The dataset was processed as follows:
Preprocessing
- Example sentences were extracted from the OED and augmented with definitions.
- The target word was highlighted with special tokens (
<t>
,</t>
). - Each instance was labeled with a graded similarity score.
Tokenization & Encoding
- Tokenized with
AutoTokenizer.from_pretrained("roberta-large")
. - Definitions were concatenated using the
</s></s>
separator for cross-sentence representation.
- Tokenized with
Training Pipeline
- Model fine-tuned on the regression task with a single linear output head.
- Used Mean Squared Error (MSE) loss.
- Evaluation on validation set using Root Mean Squared Error (RMSE).
Usage
Example Code
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("ChangeIsKey/graded-wsd")
model = AutoModelForSequenceClassification.from_pretrained("ChangeIsKey/graded-wsd")
sentence = "The <t>bank</t> of the river was eroding due to the storm."
target_word = "bank"
definition = "The land alongside a river or a stream."
tokenized_input = tokenizer(f"{sentence} </s></s> {definition}", truncation=True, padding=True, return_tensors="pt")
with torch.no_grad():
output = model(**tokenized_input)
score = output.logits.item()
print(f"Graded Sense Score: {score}")
Input Format
- Sentence: Contextual usage of the word.
- Target Word: The word to be disambiguated.
- Definition: The dictionary definition of the intended sense.
Output
- A continuous score (between 1 and 4) indicating the similarity of the given definition with respect to the word in its current context.
Citation
If you use this model, please cite the following paper:
@article{cassotti2025,
title={Sense-specific Historical Word Usage Generation},
author={Cassotti, Pierluigi and Tahmasebi, Nina},
journal={TACL},
year={2025}
}
- Downloads last month
- 11
Model tree for ChangeIsKey/graded-wsd
Base model
FacebookAI/roberta-large