seanpedrickcase
commited on
Commit
•
bffbd2b
1
Parent(s):
cf9d95c
Added opencv installation to dockerfile and reverted to slim-bookworm
Browse files- Dockerfile +3 -2
- README.md +1 -0
- requirements.txt +1 -1
Dockerfile
CHANGED
@@ -1,11 +1,12 @@
|
|
1 |
-
|
2 |
-
FROM public.ecr.aws/docker/library/python:3.11.9
|
3 |
|
4 |
# Install system dependencies. Need to specify -y for poppler to get it to install
|
5 |
RUN apt-get update \
|
6 |
&& apt-get install -y \
|
7 |
tesseract-ocr -y \
|
8 |
poppler-utils -y \
|
|
|
|
|
9 |
&& apt-get clean \
|
10 |
&& rm -rf /var/lib/apt/lists/*
|
11 |
|
|
|
1 |
+
FROM public.ecr.aws/docker/library/python:3.11.9-slim-bookworm
|
|
|
2 |
|
3 |
# Install system dependencies. Need to specify -y for poppler to get it to install
|
4 |
RUN apt-get update \
|
5 |
&& apt-get install -y \
|
6 |
tesseract-ocr -y \
|
7 |
poppler-utils -y \
|
8 |
+
libgl1-mesa-glx -y \
|
9 |
+
libglib2.0-0 -y \
|
10 |
&& apt-get clean \
|
11 |
&& rm -rf /var/lib/apt/lists/*
|
12 |
|
README.md
CHANGED
@@ -16,3 +16,4 @@ Take an image-based or text-based PDF document and redact any personal informati
|
|
16 |
|
17 |
WARNING: This is a beta product. It is not 100% accurate, and it will miss some personal information. It is essential that all outputs are checked **by a human** to ensure that all personal information has been removed.
|
18 |
|
|
|
|
16 |
|
17 |
WARNING: This is a beta product. It is not 100% accurate, and it will miss some personal information. It is essential that all outputs are checked **by a human** to ensure that all personal information has been removed.
|
18 |
|
19 |
+
Other redaction entities are possible to include in this app easily, especially country-specific entities. If you want to use these, clone the repo locally and add entity names from [this link](https://microsoft.github.io/presidio/supported_entities/) to the 'full_entity_list' variable in app.py.
|
requirements.txt
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
pdfminer.six==20231228
|
2 |
pdf2image==1.17.0
|
3 |
-
|
4 |
presidio_analyzer==2.2.351
|
5 |
presidio_anonymizer==2.2.351
|
6 |
presidio-image-redactor==0.0.52
|
|
|
1 |
pdfminer.six==20231228
|
2 |
pdf2image==1.17.0
|
3 |
+
opencv-python
|
4 |
presidio_analyzer==2.2.351
|
5 |
presidio_anonymizer==2.2.351
|
6 |
presidio-image-redactor==0.0.52
|