feat: initial commit

Files changed (4) hide show

.gitattributes +1 -0
README.md +41 -2
baseline.ckpt +3 -0
bert-large-cased_WSDEval-ALL.hdf5 +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.hdf5 filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: apache-2.0
 ---

+# Semantic Specialization for Knowledge-based Word Sense Disambiguation
+* This repository contains the trained model (projection heads) and sense/context embeddings used for training and evaluating the model.
+* If you want to learn how to use these files, please refer to the [semantic_specialization_for_wsd](https://github.com/s-mizuki-nlp/semantic_specialization_for_wsd) repository.
+## Trained Model (Projection Heads)
+* File: checkpoints/baseline/last.ckpt
+* This is one of the trained models used for reporting the main results (Table 2 in [Mizuki and Okazaki, EACL2023]).
+  NOTE: Five runs were performed in total.
+* The main hyperparameters used for training are as follows:
+| Argument name                                                  | Value                      | Description                                                                        |
+|----------------------------------------------------------------|----------------------------|------------------------------------------------------------------------------------|
+| max_epochs                                                     | 15                         | Maximum number of training epochs                                                  |
+| cfg_similarity_class.temperature ($\beta^{-1}$)                | 0.015625 (=1/64)           | Temperature parameter for the contrastive loss                                     |
+| batch_size ($N_B$)                                             | 256                        | Number of samples in each batch for the attract-repel and self-training objectives |
+| coef_max_pool_margin_loss ($\alpha$)                           | 0.2                        | Coefficient for the self-training loss                                             |
+| cfg_gloss_projection_head.n_layer                              | 2                          | Number of FFNN layers for the projection heads                                     |
+| cfg_gloss_projection_head.max_l2_norm_ratio ($\epsilon$)       | 0.015                      | Hyperparameter for the distance constraint integrated in the projection heads      |
+## Sense/context embeddings
+* Directory: `data/bert_embeddings/`
+* Sense embeddings: `bert-large-cased_WordNet_Gloss_Corpus.hdf5`
+* Context embeddings for the self-training objective: `bert-large-cased_SemCor.hdf5`
+* Context embeddings for evaluating the WSD task: `bert-large-cased_WSDEval-ALL.hdf5`
+# Reference
+```
+@inproceedings{Mizuki:EACL2023,
+    title     = "Semantic Specialization for Knowledge-based Word Sense Disambiguation",
+    author    = "Mizuki, Sakae and Okazaki, Naoaki",
+    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
+    series = {EACL},
+    month = may,
+    year = "2023",
+    address = "Dubrovnik, Croatia",
+    publisher = "Association for Computational Linguistics",
+    pages = "3449--3462",
+}
+```
 ---

baseline.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d383629f69d5b4e6b199ccf9527ec61c7000bae92cf24272a75ea4d2f0ddfb70
+size 75634843

bert-large-cased_WSDEval-ALL.hdf5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:766b7202f3d8d647ca40f8cf8e8167145d4a63343a8539b2de065e7600a41eff
+size 139880832