Transformers
English
word_sense_disambiguation
Inference Endpoints
Sakae Mizuki commited on
Commit
0a49ec5
·
1 Parent(s): 96177f0

feat: initial commit

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.hdf5 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
1
+ # Semantic Specialization for Knowledge-based Word Sense Disambiguation
2
+ * This repository contains the trained model (projection heads) and sense/context embeddings used for training and evaluating the model.
3
+ * If you want to learn how to use these files, please refer to the [semantic_specialization_for_wsd](https://github.com/s-mizuki-nlp/semantic_specialization_for_wsd) repository.
4
+
5
+ ## Trained Model (Projection Heads)
6
+ * File: checkpoints/baseline/last.ckpt
7
+ * This is one of the trained models used for reporting the main results (Table 2 in [Mizuki and Okazaki, EACL2023]).
8
+ NOTE: Five runs were performed in total.
9
+ * The main hyperparameters used for training are as follows:
10
+
11
+ | Argument name | Value | Description |
12
+ |----------------------------------------------------------------|----------------------------|------------------------------------------------------------------------------------|
13
+ | max_epochs | 15 | Maximum number of training epochs |
14
+ | cfg_similarity_class.temperature ($\beta^{-1}$) | 0.015625 (=1/64) | Temperature parameter for the contrastive loss |
15
+ | batch_size ($N_B$) | 256 | Number of samples in each batch for the attract-repel and self-training objectives |
16
+ | coef_max_pool_margin_loss ($\alpha$) | 0.2 | Coefficient for the self-training loss |
17
+ | cfg_gloss_projection_head.n_layer | 2 | Number of FFNN layers for the projection heads |
18
+ | cfg_gloss_projection_head.max_l2_norm_ratio ($\epsilon$) | 0.015 | Hyperparameter for the distance constraint integrated in the projection heads |
19
+
20
+ ## Sense/context embeddings
21
+ * Directory: `data/bert_embeddings/`
22
+ * Sense embeddings: `bert-large-cased_WordNet_Gloss_Corpus.hdf5`
23
+ * Context embeddings for the self-training objective: `bert-large-cased_SemCor.hdf5`
24
+ * Context embeddings for evaluating the WSD task: `bert-large-cased_WSDEval-ALL.hdf5`
25
+
26
+ # Reference
27
+
28
+ ```
29
+ @inproceedings{Mizuki:EACL2023,
30
+ title = "Semantic Specialization for Knowledge-based Word Sense Disambiguation",
31
+ author = "Mizuki, Sakae and Okazaki, Naoaki",
32
+ booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
33
+ series = {EACL},
34
+ month = may,
35
+ year = "2023",
36
+ address = "Dubrovnik, Croatia",
37
+ publisher = "Association for Computational Linguistics",
38
+ pages = "3449--3462",
39
+ }
40
+ ```
41
+
42
  ---
baseline.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d383629f69d5b4e6b199ccf9527ec61c7000bae92cf24272a75ea4d2f0ddfb70
3
+ size 75634843
bert-large-cased_WSDEval-ALL.hdf5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:766b7202f3d8d647ca40f8cf8e8167145d4a63343a8539b2de065e7600a41eff
3
+ size 139880832