Semantic Specialization for Knowledge-based Word Sense Disambiguation

This repository contains the trained model (projection heads) and sense/context embeddings used for training and evaluating the model.
If you want to learn how to use these files, please refer to the semantic_specialization_for_wsd repository.

Trained Model (Projection Heads)

File: checkpoints/baseline/last.ckpt
This is one of the trained models used for reporting the main results (Table 2 in [Mizuki and Okazaki, EACL2023]).
NOTE: Five runs were performed in total.
The main hyperparameters used for training are as follows:

Argument name	Value	Description
max_epochs	15	Maximum number of training epochs
cfg_similarity_class.temperature ($\beta^{-1}$)	0.015625 (=1/64)	Temperature parameter for the contrastive loss
batch_size ($N_B$)	256	Number of samples in each batch for the attract-repel and self-training objectives
coef_max_pool_margin_loss ($\alpha$)	0.2	Coefficient for the self-training loss
cfg_gloss_projection_head.n_layer	2	Number of FFNN layers for the projection heads
cfg_gloss_projection_head.max_l2_norm_ratio ($\epsilon$)	0.015	Hyperparameter for the distance constraint integrated in the projection heads

Sense/context embeddings

Directory: data/bert_embeddings/
Sense embeddings: bert-large-cased_WordNet_Gloss_Corpus.hdf5
Context embeddings for the self-training objective: bert-large-cased_SemCor.hdf5
Context embeddings for evaluating the WSD task: bert-large-cased_WSDEval-ALL.hdf5

Reference

@inproceedings{Mizuki:EACL2023,
    title     = "Semantic Specialization for Knowledge-based Word Sense Disambiguation",
    author    = "Mizuki, Sakae and Okazaki, Naoaki",
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
    series = {EACL},
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    pages = "3449--3462",
}

arXiv version is also available.