File size: 2,593 Bytes
aeb7365
 
 
bdf451a
aeb7365
 
 
 
 
 
 
bdf451a
 
aeb7365
 
 
 
fe0889e
 
a592955
 
 
 
 
bdf451a
 
11c8191
bdf451a
11c8191
bdf451a
 
 
11c8191
 
 
 
 
 
bdf451a
11c8191
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
language:
- en
tags:
- NER
- named entity recognition
- RE
- relation extraction
- entity mention detection
- EMD
- coreference resolution
license: apache-2.0
datasets:
- Ontonotes
- CoNLL04
---

# CoReNer

## Demo

We released an online demo so you can easily play with the model. Check it out: [http://corener-demo.aiola-lab.com](http://corener-demo.aiola-lab.com). 
The demo uses the [aiola/roberta-base-corener](https://huggingface.co/aiola/roberta-base-corener) model.

## Model description

A multi-task model for named-entity recognition, relation extraction, entity mention detection, and coreference resolution.

We model NER as a span classification task and relation extraction as a multi-label classification of (NER) span tuples.
Similarly, model EMD as a span classification task and CR as a binary classification of (EMD) span tuples.
To construct the CR clusters, we keep the top antecedent of each mention, then compute the connected components of the mentions' undirected graph.

The model was trained to recognize: 
- Entity types: GPE, ORG, PERSON, DATE, NORP, CARDINAL, MONEY, PERCENT, WORK_OF_ART, ORDINAL, EVENT, LOC, TIME, FAC, QUANTITY, LAW, PRODUCT, LANGUAGE. 
- Relation types: Kill, Live_In, Located_In, OrgBased_In, Work_For.

## Usage example

See additional details and usage examples at: https://github.com/aiola-lab/corener.

```python
import json

from transformers import AutoTokenizer
from corener.models import Corener, ModelOutput
from corener.data import MTLDataset
from corener.utils.prediction import convert_model_output


tokenizer = AutoTokenizer.from_pretrained("aiola/roberta-base-corener")
model = Corener.from_pretrained("aiola/roberta-base-corener")
model.eval()

examples = [
    "Apple Park is the corporate headquarters of Apple Inc., located in Cupertino, California, United States. It was opened to employees in April 2017, while construction was still underway, and superseded the original headquarters at 1 Infinite Loop, which opened in 1993."
]

dataset = MTLDataset(
    types=model.config.types, 
    tokenizer=tokenizer,
    train_mode=False,
)
dataset.read_dataset(examples)
example = dataset.get_example(0)  # get first example

output: ModelOutput = model(
    input_ids=example.encodings,
    context_masks=example.context_masks,
    entity_masks=example.entity_masks,
    entity_sizes=example.entity_sizes,
    entity_spans=example.entity_spans,
    entity_sample_masks=example.entity_sample_masks,
    inference=True,
)

print(json.dumps(convert_model_output(output=output, batch=example, dataset=dataset), indent=2))
```