Added AWS Textract support. Allowed for OCR logs export. e9c4101 seanpedrickcase commited on Sep 18, 2024
Updated default AWS_FUNCTION value. Logs seconds values from outputs correctly. 7aa4d5f seanpedrickcase commited on Sep 17, 2024
Should now correctly extract and sum up total processing time f8700a5 seanpedrickcase commited on Sep 16, 2024
Enhanced logging of usage. Small buffer added to redaction rectangles as it seems to miss the tops of text often. 34addbf seanpedrickcase commited on Sep 16, 2024
Can now select only specific pages in document to redact. Image based redaction should work correctly now. bc4bdbd seanpedrickcase commited on Sep 3, 2024
Handles multiple runs with multiple files correctly now. Logging and feedback improvements. bbf818d seanpedrickcase commited on Aug 21, 2024
Decision process now saved as log files. Other log files and feedback added 8c33828 seanpedrickcase commited on Aug 20, 2024
Added logging, anonymising all Excel sheets, simple redaction tags, some Dockerfile optimisation 01c88c0 seanpedrickcase commited on Aug 15, 2024
Added possibility to do authentication with AWS Cognito on load. Other minor changes. bc22fc4 seanpedrickcase commited on Jul 15, 2024
Can now redaction text or csv/xlsx files. Can redact multiple files. Embeds redactions as image-based file by default 7810536 seanpedrickcase commited on Jun 21, 2024
Better redaction output formatting. Custom output folders allowed. Upgraded Gradio version 12224f5 seanpedrickcase commited on Jun 6, 2024
Version 0.1. Adapted code for pyinstaller local executable conversion (Windows) 2a4b347 seanpedrickcase commited on May 22, 2024
Added TLDExtract cache files so that internet connection is not required dce6100 seanpedrickcase commited on May 20, 2024
Re-arranged image and text analysis to encourage text analysis (faster) 72a4f68 seanpedrickcase commited on May 16, 2024
Separated file preparation and file redaction functions. Hopefully sts endpoint access now works on AWS 0f18146 seanpedrickcase commited on May 15, 2024
Added some commentary to file conversion and redaction a63133d seanpedrickcase commited on May 13, 2024
Page conversion now page by page calls hopefully to avoid fastapi timeouts on AWS. gunicorn keep_alive parameter extended to 60 seconds just in case that helps too. 43287c3 seanpedrickcase commited on May 13, 2024
Unspecifying gradio and spacy in requirements, then reinstalling latest gradio afterwards in Dockerfile. All to try to avoid typer conflict 619a281 seanpedrickcase commited on May 13, 2024
Removed spacy version specification (3.7.4), as it creates a conflict with latest gradio version (4.31.0) 16dc1f9 seanpedrickcase commited on May 13, 2024
Changed boto3 package version in requirements to latest valid version (1.34.103) efd2dce seanpedrickcase commited on May 13, 2024
Updated gradio version to latest (4.31.0) in hope to address AWS server timeout issues. Other tested package versions specified in requirements. 44647fa seanpedrickcase commited on May 13, 2024
Specify GRADIO_SERVER_NAME variable in Dockerfile as 0.0.0.0 85a7cbf seanpedrickcase commited on Apr 25, 2024
Modified Dockerfile to run with user 1000. Changed port to standard 7860 and removed server name specification. 71761cb seanpedrickcase commited on Apr 25, 2024
Added opencv installation to dockerfile and reverted to slim-bookworm bffbd2b seanpedrickcase commited on Apr 25, 2024
Changed base python distribution to (hopefully) have access to tesseract-ocr package 5f91219 seanpedrickcase commited on Apr 25, 2024
Added -y to tesseract-ocr installation in Dockerfile b723aad seanpedrickcase commited on Apr 25, 2024
Added -y to poppler-utils installation in Dockerfile. Added support for image files in image-based redaction. 37d982e seanpedrickcase commited on Apr 25, 2024