Yu (Hope) Hou commited on
Commit
224cb2c
·
1 Parent(s): 6820a49

update FE display

Browse files
Files changed (1) hide show
  1. src/about.py +3 -9
src/about.py CHANGED
@@ -71,26 +71,20 @@ E.g. {'guess': 'Apple', 'confidence': 0.02}
71
  Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
72
 
73
  #### Customized retriever
74
- If you didnt submit anything for retriever, we will feed the `context` string with our pre-loaded context. However, we do provide the option for you to customize your retriever model with the dataset you wish to do retrieval. Please check the tutorial example for more details.
75
 
76
  ## Evaluation Metric
77
- For each question in the test set, we parsed it into multiple runs and fed each run as the question to your pipeline. Then we use the confidence scores calculated for all runs to get the Buzz Confidence.
78
 
79
  ## FAQ
80
  What if my system type is not specified here or not supported yet?
81
  - Please have a private post to instructors so we could check how we could adapt the leaderboard for your purpose. Thanks!
82
 
83
- I dont understand where I could start to build a QA system for submission.
84
  - Please check our submission tutorials. From there, you could fine-tune or do anything above the base models.
85
 
86
  I want to use API-based QA systems for submission, like GPT4. What should I do?
87
  - We don't support API-based models now but you could train your model with the GPT cache we provided: https://github.com/Pinafore/nlp-hw/tree/master/models.
88
-
89
- I want to test my model locally before submission. How could I do that?
90
- - In addition to tutorial test, please also ensure your model could be loaded with the below code, so it could pass the frontend check.
91
- ```
92
- AutoConfig.from_pretrained(model_name, revision="main", trust_remote_code=True/False, token=ACCESS_TOKEN)
93
- ```
94
  """
95
 
96
  EVALUATION_QUEUE_TEXT = """
 
71
  Reminder: If you are playing around with an extractive QA model already, HF QA models output the `score` already, so you only need to wrap the `score` to `confidence`.
72
 
73
  #### Customized retriever
74
+ If you didn't submit anything for retriever, we will feed the `context` string with our pre-loaded context. However, we do provide the option for you to customize your retriever model with the dataset you wish to do retrieval. Please check the tutorial example for more details.
75
 
76
  ## Evaluation Metric
77
+ In our Grounded QA task, we evaluate the QA model's reliability of their performance by measuring their calibration estimates where we consider the confidence of guess confidence values. To understand this concept better, we adopt the concept of "buzz" in Trivia Quiz, where buzz happens whenever the player is confident enough to predict the correct guess in the middle of a question. This also applies to our measurement of model calibration as we focus whether the model prediction probability matches its prediction accuracy. Our evaluation metric, `Average Expected Buzz`, quantifies the expected buzz confidence estimation.
78
 
79
  ## FAQ
80
  What if my system type is not specified here or not supported yet?
81
  - Please have a private post to instructors so we could check how we could adapt the leaderboard for your purpose. Thanks!
82
 
83
+ I don't understand where I could start to build a QA system for submission.
84
  - Please check our submission tutorials. From there, you could fine-tune or do anything above the base models.
85
 
86
  I want to use API-based QA systems for submission, like GPT4. What should I do?
87
  - We don't support API-based models now but you could train your model with the GPT cache we provided: https://github.com/Pinafore/nlp-hw/tree/master/models.
 
 
 
 
 
 
88
  """
89
 
90
  EVALUATION_QUEUE_TEXT = """