cyberosa
commited on
Commit
β’
f8a7cbc
1
Parent(s):
1cd55fe
Small adjustment on the about benchmark
Browse files- tabs/faq.py +7 -5
tabs/faq.py
CHANGED
@@ -1,13 +1,15 @@
|
|
1 |
about_olas_predict_benchmark = """\
|
2 |
How good are LLMs at making predictions about events in the future? This is a topic that hasn't been well explored to date.
|
3 |
[Olas Predict](https://olas.network/services/prediction-agents) aims to rectify this by incentivizing the creation of agents that make predictions about future events (through prediction markets).
|
4 |
-
These agents are tested in the wild on real-time prediction market data, which you can see on [here](https://huggingface.co/datasets/valory/prediction_market_data) on HuggingFace (updated weekly)
|
5 |
-
|
6 |
-
|
7 |
|
8 |
-
π π§ The autocast dataset resolved-questions are from a timeline ending in 2022. Thus the current reported accuracy measure might be an in-sample forecasting one.
|
9 |
-
|
|
|
10 |
|
|
|
11 |
π€ Pick a tool and run it on the benchmark using the "π₯ Run the Benchmark" page!
|
12 |
"""
|
13 |
|
|
|
1 |
about_olas_predict_benchmark = """\
|
2 |
How good are LLMs at making predictions about events in the future? This is a topic that hasn't been well explored to date.
|
3 |
[Olas Predict](https://olas.network/services/prediction-agents) aims to rectify this by incentivizing the creation of agents that make predictions about future events (through prediction markets).
|
4 |
+
These agents are tested in the wild on real-time prediction market data, which you can see on [here](https://huggingface.co/datasets/valory/prediction_market_data) on HuggingFace (updated weekly).\
|
5 |
+
|
6 |
+
However, if you want to create an agent with new tools, waiting for real-time results to arrive is slow. This is where the Olas Predict Benchmark comes in. It allows devs to backtest new approaches on a historical event forecasting dataset (refined from [Autocast](https://arxiv.org/abs/2206.15474)) with high iteration speed.
|
7 |
|
8 |
+
π π§ The autocast dataset resolved-questions are from a timeline ending in 2022, so the models might be trained on some of these data. Thus the current reported accuracy measure might be an in-sample forecasting one.
|
9 |
+
However, we can learn about the relative strengths of the different approaches (e.g models and logic), before testing the most promising ones on real-time unseen data.
|
10 |
+
This HF Space showcases the performance of the various models and workflows (called tools in the Olas ecosystem) for making predictions, in terms of accuracy and cost.\
|
11 |
|
12 |
+
|
13 |
π€ Pick a tool and run it on the benchmark using the "π₯ Run the Benchmark" page!
|
14 |
"""
|
15 |
|