Spaces:
Runtime error
Runtime error
title: Legal Entity NER - Transformers | |
emoji: π | |
colorFrom: green | |
colorTo: gray | |
sdk: docker | |
pinned: true | |
app_file: gradio_ner.py | |
models : ["aimlnerd/bert-finetuned-legalentity-ner-accelerate"] | |
tags : ['bert', 'tokenclassification', 'ner', 'transfomers'] | |
python_version : 3.11.5 | |
# Extract Legal Entities from Insurance Documents using BERT transfomers | |
This space use fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters. | |
Dataset is publicly available here | |
https://github.com/aws-samples/aws-legal-entity-extraction.git | |
The model extracts the following entities: | |
* Law Firm | |
* Law Office Address | |
* Insurance Company | |
* Insurance Company Address | |
* Policy Holder Name | |
* Beneficiary Name | |
* Policy Number | |
* Payout | |
* Required Action | |
* Sender | |
Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training. | |
## Data preprocessing | |
The OCRed data is present as JSON here ```data/raw_data/annotations```. | |
I wrote this code to convert the JSON data in format suitable for HF TokenClassification | |
```source/services/ner/awscomprehend_2_ner_format.py``` | |
## Finetuning BERT Transformers model | |
```source/services/ner/train/train.py``` | |
This code fine tune the BERT model and uploads to huggingface | |