aimlnerd commited on
Commit
26bfc86
·
1 Parent(s): d665726

add readme

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -7,4 +7,33 @@ sdk: docker
7
  pinned: true
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: true
8
  ---
9
 
10
+ # Extract Legal Entities from Insurance Documents using BERT transfomers
11
+
12
+ This space use fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters.
13
+
14
+ Dataset is publicly available here
15
+ https://github.com/aws-samples/aws-legal-entity-extraction.git
16
+
17
+ The model extracts the following entities:
18
+
19
+ * Law Firm
20
+ * Law Office Address
21
+ * Insurance Company
22
+ * Insurance Company Address
23
+ * Policy Holder Name
24
+ * Beneficiary Name
25
+ * Policy Number
26
+ * Payout
27
+ * Required Action
28
+ * Sender
29
+
30
+ Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training.
31
+
32
+ ## Data preprocessing
33
+ The OCRed data is present as JSON here ```data/raw_data/annotations```.
34
+ I wrote this code to convert the JSON data in format suitable for HF TokenClassification
35
+ ```source/services/ner/awscomprehend_2_ner_format.py```
36
+
37
+ ## Finetuning BERT Transformers model
38
+ ```source/services/ner/train/train.py```
39
+ This code fine tune the BERT model and uploads to huggingface