IAMJB commited on
Commit
c38692c
·
verified ·
1 Parent(s): e8158c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -1
README.md CHANGED
@@ -4,7 +4,7 @@ emoji: 🌖
4
  colorFrom: pink
5
  colorTo: indigo
6
  sdk: gradio
7
- sdk_version: 4.44.1
8
  app_file: app.py
9
  pinned: false
10
  license: mit
@@ -12,3 +12,37 @@ python_version: 3.8
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  colorFrom: pink
5
  colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 5.8.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+ ## Citation
17
+
18
+ <!-- https://arxiv.org/abs/2405.03595 -->
19
+
20
+ **BibTeX:**
21
+ ```
22
+ @inproceedings{ostmeier-etal-2024-green,
23
+ title = "{GREEN}: Generative Radiology Report Evaluation and Error Notation",
24
+ author = "Ostmeier, Sophie and
25
+ Xu, Justin and
26
+ Chen, Zhihong and
27
+ Varma, Maya and
28
+ Blankemeier, Louis and
29
+ Bluethgen, Christian and
30
+ Md, Arne Edward Michalson and
31
+ Moseley, Michael and
32
+ Langlotz, Curtis and
33
+ Chaudhari, Akshay S and
34
+ Delbrouck, Jean-Benoit",
35
+ editor = "Al-Onaizan, Yaser and
36
+ Bansal, Mohit and
37
+ Chen, Yun-Nung",
38
+ booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
39
+ month = nov,
40
+ year = "2024",
41
+ address = "Miami, Florida, USA",
42
+ publisher = "Association for Computational Linguistics",
43
+ url = "https://aclanthology.org/2024.findings-emnlp.21",
44
+ doi = "10.18653/v1/2024.findings-emnlp.21",
45
+ pages = "374--390",
46
+ abstract = "Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to its medical nature. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GREEN (Generative Radiology Report Evaluation and Error Notation), a radiology report generation metric that leverages the natural language understanding of language models to identify and explain clinically significant errors in candidate reports, both quantitatively and qualitatively. Compared to current metrics, GREEN offers: 1) a score aligned with expert preferences, 2) human interpretable explanations of clinically significant errors, enabling feedback loops with end-users, and 3) a lightweight open-source method that reaches the performance of commercial counterparts. We validate our GREEN metric by comparing it to GPT-4, as well as to error counts of 6 experts and preferences of 2 experts. Our method demonstrates not only higher correlation with expert error counts, but simultaneously higher alignment with expert preferences when compared to previous approaches.",
47
+ }
48
+ ```