language: ja
thumbnail: https://github.com/studio-ousia/luke/raw/master/resources/luke_logo.png
tags:
- luke
- named entity recognition
- entity typing
- relation classification
- question answering
license: apache-2.0
luke-japanese
luke-japanese is the Japanese version of LUKE (Language Understanding with Knowledge-based Embeddings), a pre-trained knowledge-enhanced contextualized representation of words and entities. LUKE treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Please refer to our GitHub repository for more details and updates.
This model contains Wikipedia entity embeddings which are not used in general NLP tasks. Please use the lite version for tasks that do not use Wikipedia entities as inputs.
luke-japaneseは、単語とエンティティの知識拡張型訓練済みTransformerモデルLUKEの日本語版です。LUKEは単語とエンティティを独立したトークンとして扱い、これらの文脈を考慮した表現を出力します。詳細については、GitHub リポジトリを参照してください。
このモデルは、通常のNLPタスクでは使われないWikipediaエンティティのエンベディングを含んでいます。単語の入力のみを使うタスクには、lite versionを使用してください。
Experimental results on JGLUE
The experimental results evaluated on the dev set of JGLUE is shown as follows:
Model | MARC-ja | JSTS | JNLI | JCommonsenseQA |
---|---|---|---|---|
acc | Pearson/Spearman | acc | acc | |
LUKE Japanese base | 0.965 | 0.912/0.875 | 0.912 | 0.842 |
Baselines: | ||||
Tohoku BERT base | 0.958 | 0.899/0.859 | 0.899 | 0.808 |
NICT BERT base | 0.958 | 0.903/0.867 | 0.902 | 0.823 |
Waseda RoBERTa base | 0.962 | 0.901/0.865 | 0.895 | 0.840 |
XLM RoBERTa base | 0.961 | 0.870/0.825 | 0.893 | 0.687 |
The baseline scores are obtained from here.
Citation
@inproceedings{yamada2020luke,
title={LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention},
author={Ikuya Yamada and Akari Asai and Hiroyuki Shindo and Hideaki Takeda and Yuji Matsumoto},
booktitle={EMNLP},
year={2020}
}