Spaces:

holistic-ai
/

explainbility_benchmark

Sleeping

App Files Files Community

Zekun Wu commited on Jun 23, 2024

Commit

9b04c57

1 Parent(s): ef3367f

add

Browse files

Files changed (1) hide show

util/evaluator.py +11 -10

util/evaluator.py CHANGED Viewed

@@ -32,23 +32,23 @@ class evaluator:
         Factually Correct:
         Definition: The explanation must be accurate and relevant to the question and the subject matter.
-        Score: (0-1) How factually correct is the explanation? Consider the accuracy of the details provided and their relevance to the question.
         Useful:
         Definition: The explanation should enable the user to understand the answer better and should facilitate further reasoning or decision-making.
-        Score: (0-1) How useful is the explanation in helping the user understand the answer and make informed decisions?
         Context Specific:
         Definition: The explanation should be relevant to the specific context or scenario implied by the question.
-        Score: (0-1) How well does the explanation address the specific context or scenario of the question?
         User Specific:
         Definition: The explanation should cater to the knowledge level and interests of the user, assuming typical or specified user characteristics.
-        Score: (0-1) How well does the explanation cater to the needs and knowledge level of the intended user?
         Provides Pluralism:
         Definition: The explanation should offer or accommodate multiple viewpoints or interpretations, allowing the user to explore various perspectives.
-        Score: (0-1) How well does the explanation provide or support multiple perspectives?
         After evaluating the provided question and explanation based on the five principles, please format your scores and justifications in a JSON dictionary. Directly provide me with the JSON without any additional text.
@@ -56,23 +56,23 @@ class evaluator:
         {{
         "Factually Correct": {{
             "Justification": "The explanation is mostly accurate with only minor inaccuracies.",
-            "Score": 0.9
         }},
         "Useful": {{
             "Justification": "The explanation is very helpful in understanding the main concept.",
-            "Score": 0.85
         }},
         "Context Specific": {{
             "Justification": "The explanation is generally relevant to the specific context but lacks some detail.",
-            "Score": 0.8
         }},
         "User Specific": {{
             "Justification": "The explanation is appropriate for the typical user but may be too technical for some.",
-            "Score": 0.75
         }},
         "Provides Pluralism": {{
             "Justification": "The explanation considers multiple perspectives but could include more viewpoints.",
-            "Score": 0.7
         }}
     }}
@@ -164,6 +164,7 @@ def write_evaluation_commentary(scores):
     evaluation_details = []
     for principle, details in scores.items():
         score = details.get('Score', -1)
         justification = details.get('Justification', '')

         Factually Correct:
         Definition: The explanation must be accurate and relevant to the question and the subject matter.
+        Score: (0-10) How factually correct is the explanation? Consider the accuracy of the details provided and their relevance to the question.
         Useful:
         Definition: The explanation should enable the user to understand the answer better and should facilitate further reasoning or decision-making.
+        Score: (0-10) How useful is the explanation in helping the user understand the answer and make informed decisions?
         Context Specific:
         Definition: The explanation should be relevant to the specific context or scenario implied by the question.
+        Score: (0-10) How well does the explanation address the specific context or scenario of the question?
         User Specific:
         Definition: The explanation should cater to the knowledge level and interests of the user, assuming typical or specified user characteristics.
+        Score: (0-10) How well does the explanation cater to the needs and knowledge level of the intended user?
         Provides Pluralism:
         Definition: The explanation should offer or accommodate multiple viewpoints or interpretations, allowing the user to explore various perspectives.
+        Score: (0-10) How well does the explanation provide or support multiple perspectives?
         After evaluating the provided question and explanation based on the five principles, please format your scores and justifications in a JSON dictionary. Directly provide me with the JSON without any additional text.
         {{
         "Factually Correct": {{
             "Justification": "The explanation is mostly accurate with only minor inaccuracies.",
+            "Score": 9
         }},
         "Useful": {{
             "Justification": "The explanation is very helpful in understanding the main concept.",
+            "Score": 8.5
         }},
         "Context Specific": {{
             "Justification": "The explanation is generally relevant to the specific context but lacks some detail.",
+            "Score": 8
         }},
         "User Specific": {{
             "Justification": "The explanation is appropriate for the typical user but may be too technical for some.",
+            "Score": 7.5
         }},
         "Provides Pluralism": {{
             "Justification": "The explanation considers multiple perspectives but could include more viewpoints.",
+            "Score": 7
         }}
     }}
     evaluation_details = []
     for principle, details in scores.items():
+        print(details)
         score = details.get('Score', -1)
         justification = details.get('Justification', '')