File size: 1,357 Bytes
b9d5349
a30f2dc
1d6faef
a30f2dc
1d6faef
b9d5349
a30f2dc
eba92cd
 
 
 
b9d5349
 
26bfc86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
title: Legal Entity NER - Transformers
emoji: πŸ†
colorFrom: green
colorTo: gray
sdk: docker
pinned: true
app_file: gradio_ner.py
models : ["aimlnerd/bert-finetuned-legalentity-ner-accelerate"]
tags : ['bert', 'tokenclassification', 'ner', 'transfomers']
python_version : 3.11.5
---

# Extract Legal Entities from Insurance Documents using BERT transfomers

This space use fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters.

Dataset is publicly available here
https://github.com/aws-samples/aws-legal-entity-extraction.git

The model extracts the following entities:

* Law Firm
* Law Office Address
* Insurance Company
* Insurance Company Address
* Policy Holder Name
* Beneficiary Name
* Policy Number
* Payout
* Required Action
* Sender

Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training.

## Data preprocessing
The OCRed data is present as JSON here ```data/raw_data/annotations```.
I wrote this code to convert the JSON data in format suitable for HF TokenClassification 
```source/services/ner/awscomprehend_2_ner_format.py```

## Finetuning BERT Transformers model
```source/services/ner/train/train.py```
This code fine tune the BERT model and uploads to huggingface