document_redaction / Dockerfile

Commit History

Updated Dockerfile and entrypoint file to hopefully deal correctly with APP_MODE environment variable
7c7fd7c
Running

seanpedrickcase commited on

Moved chmod command to before user switch in Dockerfile
05c20d6

seanpedrickcase commited on

Ensure entrypoint.sh is copied
3dc1171

seanpedrickcase commited on

Modified Dockerfile hopefully to not need Lambda overrides. Looking into custom headers from Cloudfront to try to get them to work
bf7bb79

seanpedrickcase commited on

Created custom csvlogger to try to overcome AWS Lambda's incompatibility with multithread locks
34bd97b

seanpedrickcase commited on

Changed app_mode arg position in dockerfile, changed default to gradio
d0b63c6

seanpedrickcase commited on

Moved entrypoint.sh creation to before user switch to avoid permission errors
7e8c1c9

seanpedrickcase commited on

Updated Dockerfile and requirements to include relevant Lambda packages
3f9e976

seanpedrickcase commited on

Switched start py file through Dockerfile to lambda_entrypoint. Added gradio links from this .py
6622361

seanpedrickcase commited on

Some more debugging. Added aws-lambda-adapter just in case that's useful in AWS Lambda
a3ba5e2

seanpedrickcase commited on

Added option for running redact function through CLI (i.e. not going through Gradio UI or API). Test functions for running this through AWS Lambda.
e5dfae7

seanpedrickcase commited on

Added logging, anonymising all Excel sheets, simple redaction tags, some Dockerfile optimisation
01c88c0

seanpedrickcase commited on

Can now redaction text or csv/xlsx files. Can redact multiple files. Embeds redactions as image-based file by default
7810536

seanpedrickcase commited on

Better redaction output formatting. Custom output folders allowed. Upgraded Gradio version
12224f5

seanpedrickcase commited on

Added TLDExtract cache files so that internet connection is not required
dce6100

seanpedrickcase commited on

Page conversion now page by page calls hopefully to avoid fastapi timeouts on AWS. gunicorn keep_alive parameter extended to 60 seconds just in case that helps too.
43287c3

seanpedrickcase commited on

correctly spelled --no-cache-dir this time
452d304

seanpedrickcase commited on

Unspecifying gradio and spacy in requirements, then reinstalling latest gradio afterwards in Dockerfile. All to try to avoid typer conflict
619a281

seanpedrickcase commited on

Created output folder specifically in Dockerfile
d32c12a

seanpedrickcase commited on

Specify GRADIO_SERVER_NAME variable in Dockerfile as 0.0.0.0
85a7cbf

seanpedrickcase commited on

Modified Dockerfile to run with user 1000. Changed port to standard 7860 and removed server name specification.
71761cb

seanpedrickcase commited on

Added opencv installation to dockerfile and reverted to slim-bookworm
bffbd2b

seanpedrickcase commited on

Changed base python distribution to (hopefully) have access to tesseract-ocr package
5f91219

seanpedrickcase commited on

Added -y to tesseract-ocr installation in Dockerfile
b723aad

seanpedrickcase commited on

Added -y to poppler-utils installation in Dockerfile. Added support for image files in image-based redaction.
37d982e

seanpedrickcase commited on