File size: 1,404 Bytes
05922fb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# Data Format
You can pass SpanFinder any formats of data, as long as you implement a dataset reader inherited from SpanReader. We also provide a Concrete dataset reader. Besides them, SpanFinder comes with its own JSON data format, which enables richer features for training and modeling.
The minimal example of the JSON is
```JSON
{
"meta": {
"fully_annotated": true
},
"tokens": ["Bob", "attacks", "the", "building", "."],
"annotations": [
{
"span": [1, 1],
"label": "Attack",
"children": [
{
"span": [0, 0],
"label": "Assailant",
"children": []
},
{
"span": [2, 3],
"label": "Victim",
"children": []
}
]
},
{
"span": [3, 3],
"label": "Buildings",
"children": [
{
"span": [3, 3],
"label": "Building",
"children": []
}
]
}
]
}
```
You can have nested spans with unlimited depth.
## Meta-info for Semantic Role Labeling (SRL)
```JSON
{
"ontology": {
"event": ["Violence-Attack"],
"argument": ["Agent", "Patient"],
"link": [[0, 0], [0, 1]]
},
"ontology_mapping": {
"event": {
"Attack": ["Violence-Attack", 0.8]
},
"argument": {
"Assault": ["Agent", 0.95],
"Victim": ["patient", 0.9]
}
}
}
```
TODO: Guanghui needs to doc this.
|