document_redaction / tools /file_conversion.py

Commit History

Added AWS Textract support. Allowed for OCR logs export.
e9c4101

seanpedrickcase commited on

Enhanced logging of usage. Small buffer added to redaction rectangles as it seems to miss the tops of text often.
34addbf

seanpedrickcase commited on

Works correctly with images again
230fcc3

seanpedrickcase commited on

Can now select only specific pages in document to redact. Image based redaction should work correctly now.
bc4bdbd

seanpedrickcase commited on

Handles multiple runs with multiple files correctly now. Logging and feedback improvements.
bbf818d

seanpedrickcase commited on

Updated decision making output files, log locations
93ac94f

seanpedrickcase commited on

Decision process now saved as log files. Other log files and feedback added
8c33828

seanpedrickcase commited on

Added logging, anonymising all Excel sheets, simple redaction tags, some Dockerfile optimisation
01c88c0

seanpedrickcase commited on

Can now redaction text or csv/xlsx files. Can redact multiple files. Embeds redactions as image-based file by default
7810536

seanpedrickcase commited on

Better redaction output formatting. Custom output folders allowed. Upgraded Gradio version
12224f5

seanpedrickcase commited on

Separated file preparation and file redaction functions. Hopefully sts endpoint access now works on AWS
0f18146

seanpedrickcase commited on

Added some commentary to file conversion and redaction
a63133d

seanpedrickcase commited on

Page conversion now page by page calls hopefully to avoid fastapi timeouts on AWS. gunicorn keep_alive parameter extended to 60 seconds just in case that helps too.
43287c3

seanpedrickcase commited on

Added -y to poppler-utils installation in Dockerfile. Added support for image files in image-based redaction.
37d982e

seanpedrickcase commited on