Commit History

Added option for running redact function through CLI (i.e. not going through Gradio UI or API). Test functions for running this through AWS Lambda.
e5dfae7

seanpedrickcase commited on

Only shows AWS options when AWS functions enabled. Can now upload previous review files to continue review later. Some review debugging.
e2aae24

seanpedrickcase commited on

Submitting modified redactions will no longer overwrite default labels
e69ae00

seanpedrickcase commited on

Should now retain modified redactions on first use of zoom
face41c

seanpedrickcase commited on

Comprehend now uses custom spacy recognisers on top of defaults. Added zoom functionality to annotator. Fixed some pdf mediabox issues and redacted image output issues.
ec98119

seanpedrickcase commited on

AWS Comprehend query numbers in logs should now add up correctly
c71d0c1

seanpedrickcase commited on

Returned file redaction timeout (before resending request) to 105 seconds default
f5b6c1b

seanpedrickcase commited on

logs should only be updated once per file run now
2e71433

seanpedrickcase commited on

Improved time taken reporting and readme
04d80a1

seanpedrickcase commited on

Consolidated AWS Comprehend redaction calls to reduce total number
542c252

seanpedrickcase commited on

When on AWS, now loads in a default allow_list to exclude common words from redaction. Improved checks on AWS Comprehend calls.
390bef2

seanpedrickcase commited on

Changed default options for AWS.
056204b

seanpedrickcase commited on

Added support for AWS Comprehend for PII identification. OCR and detection results now written to main output
f0f9378

seanpedrickcase commited on

Allowed for time limits on redact to avoid timeouts. Improved review interface. Now accepts only one file at a time. Upgraded Gradio version
eea5c07

seanpedrickcase commited on

App will now try to save modified redactions from user to json file.
4805b1c

seanpedrickcase commited on

Upgraded packages. Fixed some issues with review process. Better progress reporting for user.
5b4b5fb

seanpedrickcase commited on

Allowed for PIL to load truncated images to avoid some load errors
a680619

seanpedrickcase commited on

Added 'Review redactions' tab to the app. You can now visually inspect suggested redactions and modify/add with a point and click interface.
ebf9010

seanpedrickcase commited on

Adjusted outputs correctly for situations where the pdf mediabox size is different from the visible page size
15026f7

seanpedrickcase commited on

Redaction tool can now export pdfs with selectable text retained - redacted text is deleted and covered with a black box. Licence change for pymupdf use.
339a165

seanpedrickcase commited on

General improvement in quick image matching and merging
84c83c0

seanpedrickcase commited on

Generally improved OCR recognition of texts, corrected postcode regex
a748df6

seanpedrickcase commited on

Optimised Textract and Tesseract workings
8652429

seanpedrickcase commited on

Improved allow list, handwriting/signature identification, logging
6ea0852

seanpedrickcase commited on

Added AWS Textract support. Allowed for OCR logs export.
e9c4101

seanpedrickcase commited on

Updated time sum function to sum correctly
e1c402a

seanpedrickcase commited on

Updated default AWS_FUNCTION value. Logs seconds values from outputs correctly.
7aa4d5f

seanpedrickcase commited on

Should now correctly extract and sum up total processing time
f8700a5

seanpedrickcase commited on

Enhanced logging of usage. Small buffer added to redaction rectangles as it seems to miss the tops of text often.
34addbf

seanpedrickcase commited on

Can now select only specific pages in document to redact. Image based redaction should work correctly now.
bc4bdbd

seanpedrickcase commited on

Handles multiple runs with multiple files correctly now. Logging and feedback improvements.
bbf818d

seanpedrickcase commited on

Updated decision making output files, log locations
93ac94f

seanpedrickcase commited on

Decision process now saved as log files. Other log files and feedback added
8c33828

seanpedrickcase commited on

Added logging, anonymising all Excel sheets, simple redaction tags, some Dockerfile optimisation
01c88c0

seanpedrickcase commited on

Minor bug fix to connection parameter function
275c820

seanpedrickcase commited on

Added possibility to do authentication with AWS Cognito on load. Other minor changes.
bc22fc4

seanpedrickcase commited on

Can now redaction text or csv/xlsx files. Can redact multiple files. Embeds redactions as image-based file by default
7810536

seanpedrickcase commited on

Better redaction output formatting. Custom output folders allowed. Upgraded Gradio version
12224f5

seanpedrickcase commited on

Version 0.1. Adapted code for pyinstaller local executable conversion (Windows)
2a4b347

seanpedrickcase commited on

Separated file preparation and file redaction functions. Hopefully sts endpoint access now works on AWS
0f18146

seanpedrickcase commited on

Added some commentary to file conversion and redaction
a63133d

seanpedrickcase commited on

Page conversion now page by page calls hopefully to avoid fastapi timeouts on AWS. gunicorn keep_alive parameter extended to 60 seconds just in case that helps too.
43287c3

seanpedrickcase commited on

Added -y to poppler-utils installation in Dockerfile. Added support for image files in image-based redaction.
37d982e

seanpedrickcase commited on