MattStammers commited on
Commit
ce9b3d0
1 Parent(s): 4b8b590

working full app deployment backup

Browse files
Files changed (1) hide show
  1. app.py +53 -14
app.py CHANGED
@@ -282,36 +282,74 @@ def redact_and_visualize(text: str, model_name: str):
282
 
283
 
284
  hint = """
285
- <p align="center">
286
- <img src="https://github.com/MattStammers/Pteredactyl/blob/main/src/pteredactyl_webapp/assets/img/SETT_Logo.jpg" alt="SETT Logo" />
287
- </p>
288
 
289
- ## Pteredactyl Gradio Webapp and API
290
 
291
- Clinical patient identifiable information (cPII) presents a significant challenge in natural language processing (NLP) that has yet to be fully resolved but significant progress is being made [1,2].
292
 
293
- This is why we created [Pteredactyl](https://pypi.org/project/pteredactyl/) - a python module to help with redaction of clinical free text.
 
 
 
 
 
 
 
 
294
 
295
- ## Tool Usage Instructions
296
 
297
- When the input text is entered, the tool redacts the cPII from the entered text using NLP with labelled masking tokens and then assesses the models results. You can test the text against different models by selecting from the dropdown.
 
 
 
 
 
 
 
298
 
299
- ## Deployment Options
300
 
301
- This webapp is available online as a gradio app on Huggingface: [Huggingface Gradio App](https://huggingface.co/spaces/MattStammers/pteredactyl_PII). It is also available as [source](https://github.com/SETT-Centre-Data-and-AI/PteRedactyl) or as a Docker Image: [Docker Image](https://registry.hub.docker.com/r/mattstammers/pteredactyl). All are MIT licensed.
302
 
303
- Please note if deploying the docker image the port bindings are to 7860. The image can also be deployed from source using the following command:
304
 
305
  ```bat
306
  docker build -t pteredactyl:latest .
307
  docker run -d -p 7860:7860 --name pteredactyl-app pteredactyl:latest
308
  ```
309
 
 
 
 
 
 
 
 
 
 
 
 
 
310
  ## Logo
311
 
312
- <p align="center">
313
- <img src="https://github.com/MattStammers/Pteredactyl/blob/main/src/pteredactyl_webapp/assets/img/Pteredactyl_Logo.jpg" alt="SETT Logo" />
314
- </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
315
 
316
  ## Background
317
 
@@ -356,6 +394,7 @@ We invite the open-source community to collaborate to improve the present result
356
  ### References:
357
  1. Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods. J Am Med Inform Assoc. 2023 Feb 1;30(2):318–28.
358
  2. Kotevski DP, Smee RI, Field M, Nemes YN, Broadley K, Vajdic CM. Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting. Int J Med Inf. 2022 Dec 1;168:104880.
 
359
  """
360
 
361
  description = """
 
282
 
283
 
284
  hint = """
285
+ # Pteredactyl
 
 
286
 
287
+ _Pteredactyl utilizes advanced natural language processing techniques to identify and anonymize clinical personally identifiable information (cPII) in clinical free text. It is built on top of Microsoft's [Presidio](https://microsoft.github.io/presidio/) and allows interchange of various transformer models from [Huggingface](https://huggingface.co/)_
288
 
289
+ ## Features
290
 
291
+ - Anonymization of various entities such as names, locations, and phone numbers as per our [Documentation](https://mattstammers.github.io/Pteredactyl)
292
+ - Support for processing both strings and pandas DataFrames
293
+ - Text highlighting for easy identification of anonymized elements
294
+ - Webapp with [Gradio](https://huggingface.co/spaces/MattStammers/pteredactyl_PII)
295
+ - cPII benchmarking test: [Clinical_PII_Redaction_Test](https://huggingface.co/datasets/MattStammers/Clinical_PII_Redaction_Test)
296
+ - Production API deployed using [Docker](https://www.docker.com/) and [Gradio](https://www.gradio.app/)
297
+ - Hide in plain site replacement or masking option
298
+
299
+ ## Documentation
300
 
301
+ * Full documentation is available [here](https://mattstammers.github.io/Pteredactyl)
302
 
303
+ ## PyPi Installation
304
+
305
+ Can be installed using pip from PyPi:
306
+
307
+ ```bash
308
+ pip install pteredactyl
309
+ ```
310
+ ## Gradio Web App
311
 
312
+ This webapp is already available online as a gradio app on Huggingface: [Huggingface Gradio App](https://huggingface.co/spaces/MattStammers/pteredactyl_PII). It is also available as [source](https://github.com/SETT-Centre-Data-and-AI/PteRedactyl) or as a Docker Image: [Docker Image](https://registry.hub.docker.com/r/mattstammers/pteredactyl).
313
 
314
+ ## Docker Deployment
315
 
316
+ Please note if deploying the docker image the port bindings are to 7860. The image can be built and deployed from source using the following command:
317
 
318
  ```bat
319
  docker build -t pteredactyl:latest .
320
  docker run -d -p 7860:7860 --name pteredactyl-app pteredactyl:latest
321
  ```
322
 
323
+ Or can be deployed directly from [Docker Hub](https://registry.hub.docker.com/r/mattstammers/pteredactyl)
324
+
325
+ ## Contributions
326
+ Interested in contributing? Check out the contributing guidelines.
327
+
328
+ * [Developer Guide](https://github.com/MattStammers/Pteredactyl/blob/main/CONTRIBUTING.md)
329
+
330
+ Please note that this project follows the [Github code of conduct](https://docs.github.com/en/site-policy/github-terms/github-community-code-of-conduct). By contributing to this project, you agree to abide by its terms.
331
+
332
+ ## License
333
+ Pteredactyl was created at University Hospital Southampton NHSFT by the Research Data Science Team. It is licensed under the terms of the MIT license.
334
+
335
  ## Logo
336
 
337
+ <picture align="center">
338
+ <img alt="Pteredactyl Logo" src="https://raw.githubusercontent.com/MattStammers/Pteredactyl/main/src/pteredactyl_webapp/assets/img/Pteredactyl_Logo.jpg">
339
+ </picture>
340
+
341
+ # Abstract
342
+
343
+ - Authors: Matt Stammers🧪, Cai Davis🥼 and Michael George🩺
344
+
345
+ - Version 1. 29/06/2024
346
+
347
+ Clinical patient identifiable information (cPII) presents a significant challenge in natural language processing (NLP) that has yet to be fully resolved but significant progress is being made [1,2].
348
+
349
+ This is why we created [Pteredactyl](https://pypi.org/project/pteredactyl/) - a python module to help with redaction of clinical free text.
350
+
351
+ Full [Documentation](https://github.com/MattStammers/Pteredactyl) for the project can be found [here](https://github.com/MattStammers/Pteredactyl)
352
+
353
 
354
  ## Background
355
 
 
394
  ### References:
395
  1. Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods. J Am Med Inform Assoc. 2023 Feb 1;30(2):318–28.
396
  2. Kotevski DP, Smee RI, Field M, Nemes YN, Broadley K, Vajdic CM. Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting. Int J Med Inf. 2022 Dec 1;168:104880.
397
+
398
  """
399
 
400
  description = """