Spaces:
Sleeping
Sleeping
MattStammers
commited on
Commit
•
ce9b3d0
1
Parent(s):
4b8b590
working full app deployment backup
Browse files
app.py
CHANGED
@@ -282,36 +282,74 @@ def redact_and_visualize(text: str, model_name: str):
|
|
282 |
|
283 |
|
284 |
hint = """
|
285 |
-
|
286 |
-
<img src="https://github.com/MattStammers/Pteredactyl/blob/main/src/pteredactyl_webapp/assets/img/SETT_Logo.jpg" alt="SETT Logo" />
|
287 |
-
</p>
|
288 |
|
289 |
-
|
290 |
|
291 |
-
|
292 |
|
293 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
294 |
|
295 |
-
|
296 |
|
297 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
298 |
|
299 |
-
|
300 |
|
301 |
-
|
302 |
|
303 |
-
Please note if deploying the docker image the port bindings are to 7860. The image can
|
304 |
|
305 |
```bat
|
306 |
docker build -t pteredactyl:latest .
|
307 |
docker run -d -p 7860:7860 --name pteredactyl-app pteredactyl:latest
|
308 |
```
|
309 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
310 |
## Logo
|
311 |
|
312 |
-
<
|
313 |
-
<img src="https://
|
314 |
-
</
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
315 |
|
316 |
## Background
|
317 |
|
@@ -356,6 +394,7 @@ We invite the open-source community to collaborate to improve the present result
|
|
356 |
### References:
|
357 |
1. Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods. J Am Med Inform Assoc. 2023 Feb 1;30(2):318–28.
|
358 |
2. Kotevski DP, Smee RI, Field M, Nemes YN, Broadley K, Vajdic CM. Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting. Int J Med Inf. 2022 Dec 1;168:104880.
|
|
|
359 |
"""
|
360 |
|
361 |
description = """
|
|
|
282 |
|
283 |
|
284 |
hint = """
|
285 |
+
# Pteredactyl
|
|
|
|
|
286 |
|
287 |
+
_Pteredactyl utilizes advanced natural language processing techniques to identify and anonymize clinical personally identifiable information (cPII) in clinical free text. It is built on top of Microsoft's [Presidio](https://microsoft.github.io/presidio/) and allows interchange of various transformer models from [Huggingface](https://huggingface.co/)_
|
288 |
|
289 |
+
## Features
|
290 |
|
291 |
+
- Anonymization of various entities such as names, locations, and phone numbers as per our [Documentation](https://mattstammers.github.io/Pteredactyl)
|
292 |
+
- Support for processing both strings and pandas DataFrames
|
293 |
+
- Text highlighting for easy identification of anonymized elements
|
294 |
+
- Webapp with [Gradio](https://huggingface.co/spaces/MattStammers/pteredactyl_PII)
|
295 |
+
- cPII benchmarking test: [Clinical_PII_Redaction_Test](https://huggingface.co/datasets/MattStammers/Clinical_PII_Redaction_Test)
|
296 |
+
- Production API deployed using [Docker](https://www.docker.com/) and [Gradio](https://www.gradio.app/)
|
297 |
+
- Hide in plain site replacement or masking option
|
298 |
+
|
299 |
+
## Documentation
|
300 |
|
301 |
+
* Full documentation is available [here](https://mattstammers.github.io/Pteredactyl)
|
302 |
|
303 |
+
## PyPi Installation
|
304 |
+
|
305 |
+
Can be installed using pip from PyPi:
|
306 |
+
|
307 |
+
```bash
|
308 |
+
pip install pteredactyl
|
309 |
+
```
|
310 |
+
## Gradio Web App
|
311 |
|
312 |
+
This webapp is already available online as a gradio app on Huggingface: [Huggingface Gradio App](https://huggingface.co/spaces/MattStammers/pteredactyl_PII). It is also available as [source](https://github.com/SETT-Centre-Data-and-AI/PteRedactyl) or as a Docker Image: [Docker Image](https://registry.hub.docker.com/r/mattstammers/pteredactyl).
|
313 |
|
314 |
+
## Docker Deployment
|
315 |
|
316 |
+
Please note if deploying the docker image the port bindings are to 7860. The image can be built and deployed from source using the following command:
|
317 |
|
318 |
```bat
|
319 |
docker build -t pteredactyl:latest .
|
320 |
docker run -d -p 7860:7860 --name pteredactyl-app pteredactyl:latest
|
321 |
```
|
322 |
|
323 |
+
Or can be deployed directly from [Docker Hub](https://registry.hub.docker.com/r/mattstammers/pteredactyl)
|
324 |
+
|
325 |
+
## Contributions
|
326 |
+
Interested in contributing? Check out the contributing guidelines.
|
327 |
+
|
328 |
+
* [Developer Guide](https://github.com/MattStammers/Pteredactyl/blob/main/CONTRIBUTING.md)
|
329 |
+
|
330 |
+
Please note that this project follows the [Github code of conduct](https://docs.github.com/en/site-policy/github-terms/github-community-code-of-conduct). By contributing to this project, you agree to abide by its terms.
|
331 |
+
|
332 |
+
## License
|
333 |
+
Pteredactyl was created at University Hospital Southampton NHSFT by the Research Data Science Team. It is licensed under the terms of the MIT license.
|
334 |
+
|
335 |
## Logo
|
336 |
|
337 |
+
<picture align="center">
|
338 |
+
<img alt="Pteredactyl Logo" src="https://raw.githubusercontent.com/MattStammers/Pteredactyl/main/src/pteredactyl_webapp/assets/img/Pteredactyl_Logo.jpg">
|
339 |
+
</picture>
|
340 |
+
|
341 |
+
# Abstract
|
342 |
+
|
343 |
+
- Authors: Matt Stammers🧪, Cai Davis🥼 and Michael George🩺
|
344 |
+
|
345 |
+
- Version 1. 29/06/2024
|
346 |
+
|
347 |
+
Clinical patient identifiable information (cPII) presents a significant challenge in natural language processing (NLP) that has yet to be fully resolved but significant progress is being made [1,2].
|
348 |
+
|
349 |
+
This is why we created [Pteredactyl](https://pypi.org/project/pteredactyl/) - a python module to help with redaction of clinical free text.
|
350 |
+
|
351 |
+
Full [Documentation](https://github.com/MattStammers/Pteredactyl) for the project can be found [here](https://github.com/MattStammers/Pteredactyl)
|
352 |
+
|
353 |
|
354 |
## Background
|
355 |
|
|
|
394 |
### References:
|
395 |
1. Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods. J Am Med Inform Assoc. 2023 Feb 1;30(2):318–28.
|
396 |
2. Kotevski DP, Smee RI, Field M, Nemes YN, Broadley K, Vajdic CM. Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting. Int J Med Inf. 2022 Dec 1;168:104880.
|
397 |
+
|
398 |
"""
|
399 |
|
400 |
description = """
|