Mike commited on
Commit
1a9bb0b
1 Parent(s): dd57515

add escoxlmr skill extraction model

Browse files
README.md CHANGED
@@ -1,3 +1,29 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ This is a demo using the models from:
6
+
7
+ ```
8
+ @inproceedings{zhang-etal-2023-escoxlm,
9
+ title = "{ESCOXLM}-{R}: Multilingual Taxonomy-driven Pre-training for the Job Market Domain",
10
+ author = "Zhang, Mike and
11
+ van der Goot, Rob and
12
+ Plank, Barbara",
13
+ editor = "Rogers, Anna and
14
+ Boyd-Graber, Jordan and
15
+ Okazaki, Naoaki",
16
+ booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
17
+ month = jul,
18
+ year = "2023",
19
+ address = "Toronto, Canada",
20
+ publisher = "Association for Computational Linguistics",
21
+ url = "https://aclanthology.org/2023.acl-long.662",
22
+ doi = "10.18653/v1/2023.acl-long.662",
23
+ pages = "11871--11890",
24
+ abstract = "The increasing number of benchmarks for Natural Language Processing (NLP) tasks in the computational job market domain highlights the demand for methods that can handle job-related tasks such as skill extraction, skill classification, job title classification, and de-identification. While some approaches have been developed that are specific to the job market domain, there is a lack of generalized, multilingual models and benchmarks for these tasks. In this study, we introduce a language model called ESCOXLM-R, based on XLM-R-large, which uses domain-adaptive pre-training on the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy, covering 27 languages. The pre-training objectives for ESCOXLM-R include dynamic masked language modeling and a novel additional objective for inducing multilingual taxonomical ESCO relations. We comprehensively evaluate the performance of ESCOXLM-R on 6 sequence labeling and 3 classification tasks in 4 languages and find that it achieves state-of-the-art results on 6 out of 9 datasets. Our analysis reveals that ESCOXLM-R performs better on short spans and outperforms XLM-R-large on entity-level and surface-level span-F1, likely due to ESCO containing short skill and occupation titles, and encoding information on the entity-level.",
25
+ }
26
+ ```
27
+
28
+ Note that there is another endpoint, namely `jjzha/escoxlmr_skill_extraction`.
29
+ Knowledge can be seen as hard skills and Skills are both soft and applied skills.
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a01c1fa560f27ddd3060d450da1af553069f12f4ecc52d9179e991958510dfdb
3
+ size 991
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:051a4de5fff496d1201910ca951a65de9c4f31c21d8eb4e4a8a677d7272ed634
3
+ size 2235565297
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
+ size 5069051
special_tokens_map.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5469a60db23249c7f8945013d78df30b44b6bf686c6bb4740f4223f77b1b535
3
+ size 279
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:384f4900cbdbb808269c3d858ed426e2d66bdf41a5da49bbd92bda0e31d7ea8b
3
+ size 17082757
tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ff84f7ba61f91e0bfdc636b6f2f92498970098b069efadfbe1a0c07f6b56751
3
+ size 463