document_redaction / tools /file_conversion.py

Commit History

Added option for running redact function through CLI (i.e. not going through Gradio UI or API). Test functions for running this through AWS Lambda.
e5dfae7

seanpedrickcase commited on

Only shows AWS options when AWS functions enabled. Can now upload previous review files to continue review later. Some review debugging.
e2aae24

seanpedrickcase commited on

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.
ec98119

seanpedrickcase commited on

Improved time taken reporting and readme
04d80a1

seanpedrickcase commited on

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version
eea5c07

seanpedrickcase commited on

Upgraded packages. Fixed some issues with review process. Better progress reporting for user.
5b4b5fb

seanpedrickcase commited on

Allowed for PIL to load truncated images to avoid some load errors
a680619

seanpedrickcase commited on

Added 'Review redactions' tab to the app. You can now visually inspect suggested redactions and modify/add with a point and click interface.
ebf9010

seanpedrickcase commited on

General improvement in quick image matching and merging
84c83c0

seanpedrickcase commited on

Optimised Textract and Tesseract workings
8652429

seanpedrickcase commited on

Improved allow list, handwriting/signature identification, logging
6ea0852

seanpedrickcase commited on

Added AWS Textract support. Allowed for OCR logs export.
e9c4101

seanpedrickcase commited on

Enhanced logging of usage. Small buffer added to redaction rectangles as it seems to miss the tops of text often.
34addbf

seanpedrickcase commited on

Can now select only specific pages in document to redact. Image based redaction should work correctly now.
bc4bdbd

seanpedrickcase commited on

Handles multiple runs with multiple files correctly now. Logging and feedback improvements.
bbf818d

seanpedrickcase commited on

Updated decision making output files, log locations
93ac94f

seanpedrickcase commited on

Decision process now saved as log files. Other log files and feedback added
8c33828

seanpedrickcase commited on

Added logging, anonymising all Excel sheets, simple redaction tags, some Dockerfile optimisation
01c88c0

seanpedrickcase commited on

Can now redaction text or csv/xlsx files. Can redact multiple files. Embeds redactions as image-based file by default
7810536

seanpedrickcase commited on

Better redaction output formatting. Custom output folders allowed. Upgraded Gradio version
12224f5

seanpedrickcase commited on

Separated file preparation and file redaction functions. Hopefully sts endpoint access now works on AWS
0f18146

seanpedrickcase commited on

Added some commentary to file conversion and redaction
a63133d

seanpedrickcase commited on

Page conversion now page by page calls hopefully to avoid fastapi timeouts on AWS. gunicorn keep_alive parameter extended to 60 seconds just in case that helps too.
43287c3

seanpedrickcase commited on

Added -y to poppler-utils installation in Dockerfile. Added support for image files in image-based redaction.
37d982e

seanpedrickcase commited on