seanpedrickcase commited on
Commit
bffbd2b
1 Parent(s): cf9d95c

Added opencv installation to dockerfile and reverted to slim-bookworm

Browse files
Files changed (3) hide show
  1. Dockerfile +3 -2
  2. README.md +1 -0
  3. requirements.txt +1 -1
Dockerfile CHANGED
@@ -1,11 +1,12 @@
1
- #FROM public.ecr.aws/docker/library/python:3.11.9-slim-bookworm
2
- FROM public.ecr.aws/docker/library/python:3.11.9
3
 
4
  # Install system dependencies. Need to specify -y for poppler to get it to install
5
  RUN apt-get update \
6
  && apt-get install -y \
7
  tesseract-ocr -y \
8
  poppler-utils -y \
 
 
9
  && apt-get clean \
10
  && rm -rf /var/lib/apt/lists/*
11
 
 
1
+ FROM public.ecr.aws/docker/library/python:3.11.9-slim-bookworm
 
2
 
3
  # Install system dependencies. Need to specify -y for poppler to get it to install
4
  RUN apt-get update \
5
  && apt-get install -y \
6
  tesseract-ocr -y \
7
  poppler-utils -y \
8
+ libgl1-mesa-glx -y \
9
+ libglib2.0-0 -y \
10
  && apt-get clean \
11
  && rm -rf /var/lib/apt/lists/*
12
 
README.md CHANGED
@@ -16,3 +16,4 @@ Take an image-based or text-based PDF document and redact any personal informati
16
 
17
  WARNING: This is a beta product. It is not 100% accurate, and it will miss some personal information. It is essential that all outputs are checked **by a human** to ensure that all personal information has been removed.
18
 
 
 
16
 
17
  WARNING: This is a beta product. It is not 100% accurate, and it will miss some personal information. It is essential that all outputs are checked **by a human** to ensure that all personal information has been removed.
18
 
19
+ Other redaction entities are possible to include in this app easily, especially country-specific entities. If you want to use these, clone the repo locally and add entity names from [this link](https://microsoft.github.io/presidio/supported_entities/) to the 'full_entity_list' variable in app.py.
requirements.txt CHANGED
@@ -1,6 +1,6 @@
1
  pdfminer.six==20231228
2
  pdf2image==1.17.0
3
- #img2pdf==0.5.1
4
  presidio_analyzer==2.2.351
5
  presidio_anonymizer==2.2.351
6
  presidio-image-redactor==0.0.52
 
1
  pdfminer.six==20231228
2
  pdf2image==1.17.0
3
+ opencv-python
4
  presidio_analyzer==2.2.351
5
  presidio_anonymizer==2.2.351
6
  presidio-image-redactor==0.0.52