Spaces:

emilylearning
/

llm_uncertainty

Running

App Files Files Community

emilylearning commited on Jul 17, 2023

Commit

a828a08

1 Parent(s): 25dd383

update explanations / comments

Browse files

Files changed (4) hide show

README.md +6 -12
app.py +24 -7
spec_metric_result.png +0 -0
winogender_sentences.py +1 -1

README.md CHANGED Viewed

@@ -1,19 +1,13 @@
 ---
-title: Llm Uncertainty
-emoji: 👀
-colorFrom: indigo
-colorTo: green
 sdk: gradio
-sdk_version: 3.1.7
 app_file: app.py
 pinned: false
 ---
-# Setup env:
-```
-python3 -m venv venv_llm
-source venv_llm/bin/activate
-pip install -r requirements.txt
-```
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Uncertainty
+emoji: 🐠
+colorFrom: pink
+colorTo: pink
 sdk: gradio
+sdk_version: 3.9
 app_file: app.py
 pinned: false
+license: mit
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -210,15 +210,30 @@ demo = gr.Blocks()
 with demo:
     input_texts = gr.Variable([])
     gr.Markdown("**Detect Task Specification at Inference-time.**")
-    gr.Markdown("**Follow the numbered steps below to test one of the pre-loaded options.** Once you get the hang of it, you can load a new model and/or provide your own input texts.")
     gr.Markdown(f"""1) Pick a preloaded BERT-like model.
         *Note: RoBERTa-large performance is best.*
-    2) Pick an Occupation type from the Winogender Schemas evaluation set.
         *Or select '{PICK_YOUR_OWN_LABEL}' (it need not be about an occupation).*
-    3) Click button to load input texts.
         *Read the sentences to determine which two are well-specified for gendered pronoun coreference resolution. The rest are gender-unspecified.*
-    4) Click button to get Task Specification Metric results!
     """)
@@ -249,7 +264,7 @@ with demo:
         )
     with gr.Row():
-        get_text_btn = gr.Button("3) Click to load input texts.)")
     get_text_btn.click(
         fn=display_input_texts,
@@ -262,7 +277,9 @@ with demo:
     with gr.Row():
         uncertain_btn = gr.Button("4) Click to get Task Specification Metric results!")
     gr.Markdown(
-        "If there is an * by a sentence number, then at least one top prediction for that sentence was non-gendered.")
     with gr.Row():
         female_fig = gr.Plot(type="auto")
@@ -270,7 +287,7 @@ with demo:
         female_df = gr.Dataframe()
     with gr.Row():
         display_text = gr.Textbox(
-            type="auto", label="Sample of text fed to model")
     uncertain_btn.click(
         fn=predict_gender_pronouns,

 with demo:
     input_texts = gr.Variable([])
     gr.Markdown("**Detect Task Specification at Inference-time.**")
+    gr.Markdown("""This method exploits the specification-induced spurious correlations demonstrated in this
+                [Spurious Correlations Hugging Face Space](https://huggingface.co/spaces/anonymousauthorsanonymous/spurious) to detect task specification at inference-time.
+                For this method, well-specified tasks should have a lower specification metric value, and unspecified tasks should have a higher specification metric value.
+                """)
+    gr.Markdown("""As an example, see the figure below with test sentences from the [Winogender schema](https://aclanthology.org/N18-2002/) for the occupation of `Doctor`.
+                With a close read, you can see that only sentence numbers (3) and (4) are well-specified for the gendered pronoun resolution task:
+                the masked pronoun is coreferent with the `man` or `woman`; the remainder are unspecfied: the masked pronoun is coreferent with a gender-unspecified person.
+                In this example we have 100\% accurate detection with the specification metric near zero for only sentence (3) and (4).
+                <p align="center">
+                <img src="file/spec_metric_result.png" alt="results" width="500"/>
+                </p>
+                """)
+    gr.Markdown("**To test this for yourself, follow the numbered steps below to test one of the pre-loaded options.** Once you get the hang of it, you can load a new model and/or provide your own input texts.")
     gr.Markdown(f"""1) Pick a preloaded BERT-like model.
         *Note: RoBERTa-large performance is best.*
+    2) Pick an Occupation type from the Winogender Schemas evaluation set.
         *Or select '{PICK_YOUR_OWN_LABEL}' (it need not be about an occupation).*
+    3) Click the first button to load input texts.
         *Read the sentences to determine which two are well-specified for gendered pronoun coreference resolution. The rest are gender-unspecified.*
+    4) Click the second button to get Task Specification Metric results.
     """)
         )
     with gr.Row():
+        get_text_btn = gr.Button("3) Click to load input texts.")
     get_text_btn.click(
         fn=display_input_texts,
     with gr.Row():
         uncertain_btn = gr.Button("4) Click to get Task Specification Metric results!")
     gr.Markdown(
+        """We expect a lower specification metric value for well-specified tasks.
+        Note: If there is an * by a sentence number, then at least one top prediction for that sentence was non-gendered.""")
     with gr.Row():
         female_fig = gr.Plot(type="auto")
         female_df = gr.Dataframe()
     with gr.Row():
         display_text = gr.Textbox(
+            type="text", label="Sample of text fed to model")
     uncertain_btn.click(
         fn=predict_gender_pronouns,

spec_metric_result.png ADDED Viewed

winogender_sentences.py CHANGED Viewed

@@ -1,6 +1,6 @@
 ######################################################################
 ##
-## This script is a lightly modifed version fo taht provided in winogender-schemas
 ## https://github.com/rudinger/winogender-schemas
 ##
 ######################################################################

 ######################################################################
 ##
+## This script is a lightly modified version of that provided in winogender-schemas
 ## https://github.com/rudinger/winogender-schemas
 ##
 ######################################################################