Spaces:
Runtime error
Runtime error
add readme
Browse files
README.md
CHANGED
@@ -7,4 +7,33 @@ sdk: docker
|
|
7 |
pinned: true
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: true
|
8 |
---
|
9 |
|
10 |
+
# Extract Legal Entities from Insurance Documents using BERT transfomers
|
11 |
+
|
12 |
+
This space use fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters.
|
13 |
+
|
14 |
+
Dataset is publicly available here
|
15 |
+
https://github.com/aws-samples/aws-legal-entity-extraction.git
|
16 |
+
|
17 |
+
The model extracts the following entities:
|
18 |
+
|
19 |
+
* Law Firm
|
20 |
+
* Law Office Address
|
21 |
+
* Insurance Company
|
22 |
+
* Insurance Company Address
|
23 |
+
* Policy Holder Name
|
24 |
+
* Beneficiary Name
|
25 |
+
* Policy Number
|
26 |
+
* Payout
|
27 |
+
* Required Action
|
28 |
+
* Sender
|
29 |
+
|
30 |
+
Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training.
|
31 |
+
|
32 |
+
## Data preprocessing
|
33 |
+
The OCRed data is present as JSON here ```data/raw_data/annotations```.
|
34 |
+
I wrote this code to convert the JSON data in format suitable for HF TokenClassification
|
35 |
+
```source/services/ner/awscomprehend_2_ner_format.py```
|
36 |
+
|
37 |
+
## Finetuning BERT Transformers model
|
38 |
+
```source/services/ner/train/train.py```
|
39 |
+
This code fine tune the BERT model and uploads to huggingface
|