document_redaction / Dockerfile

Commit History

Added logging, anonymising all Excel sheets, simple redaction tags, some Dockerfile optimisation
01c88c0

seanpedrickcase commited on

Can now redaction text or csv/xlsx files. Can redact multiple files. Embeds redactions as image-based file by default
7810536

seanpedrickcase commited on

Better redaction output formatting. Custom output folders allowed. Upgraded Gradio version
12224f5

seanpedrickcase commited on

Added TLDExtract cache files so that internet connection is not required
dce6100

seanpedrickcase commited on

Page conversion now page by page calls hopefully to avoid fastapi timeouts on AWS. gunicorn keep_alive parameter extended to 60 seconds just in case that helps too.
43287c3

seanpedrickcase commited on

correctly spelled --no-cache-dir this time
452d304

seanpedrickcase commited on

Unspecifying gradio and spacy in requirements, then reinstalling latest gradio afterwards in Dockerfile. All to try to avoid typer conflict
619a281

seanpedrickcase commited on

Created output folder specifically in Dockerfile
d32c12a

seanpedrickcase commited on

Specify GRADIO_SERVER_NAME variable in Dockerfile as 0.0.0.0
85a7cbf

seanpedrickcase commited on

Modified Dockerfile to run with user 1000. Changed port to standard 7860 and removed server name specification.
71761cb

seanpedrickcase commited on

Added opencv installation to dockerfile and reverted to slim-bookworm
bffbd2b

seanpedrickcase commited on

Changed base python distribution to (hopefully) have access to tesseract-ocr package
5f91219

seanpedrickcase commited on

Added -y to tesseract-ocr installation in Dockerfile
b723aad

seanpedrickcase commited on

Added -y to poppler-utils installation in Dockerfile. Added support for image files in image-based redaction.
37d982e

seanpedrickcase commited on