File size: 2,793 Bytes
83d3ac3
 
 
57ee2e6
 
83d3ac3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import gradio as gr
import pandas as pd



csv_file_path = "formatted_data.csv"

# Reading the CSV file
df = pd.read_csv(csv_file_path)

# Markdown text with HTML formatting for Gradio
markdown_text = """
## Benchmark Overview
- The benchmark evaluates the performance of Olas Predict tools on the Autocast dataset.
- The dataset has been refined to enhance the evaluation of the tools.
- The leaderboard shows the performance of the tools based on the refined dataset.
- The script to run the benchmark is available in the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).

## How to run your tools on the benchmark
- Fork the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).
- Git init the submodules and update the submodule to get the latest dataset `mech` tool.
    - `git submodule init`
    - `git submodule update --remote --recursive`
- Include your tool in the `mech/packages` directory accordingly.
    - Guidelines on how to include your tool can be found [here](xxx).
- Run the benchmark script.

## Dataset Overview
This project leverages the Autocast dataset from the research paper titled ["Forecasting Future World Events with Neural Networks"](https://arxiv.org/abs/2206.15474). 
The dataset has undergone further refinement to enhance the performance evaluation of Olas mech prediction tools. 
Both the original and refined datasets are hosted on HuggingFace.

### Refined Dataset Files
- You can find the refined dataset on HuggingFace [here](https://huggingface.co/datasets/valory/autocast).
- `autocast_questions_filtered.json`: A JSON subset of the initial autocast dataset.
- `autocast_questions_filtered.pkl`: A pickle file mapping URLs to their respective scraped documents within the filtered dataset.
- `retrieved_docs.pkl`: Contains all the scraped texts.

### Filtering Criteria
To refine the dataset, we applied the following criteria to ensure the reliability of the URLs:
- URLs not returning HTTP 200 status codes are excluded.
- Difficult-to-scrape sites, such as Twitter and Bloomberg, are omitted.
- Links with less than 1000 words are removed.
- Only samples with a minimum of 5 and a maximum of 20 working URLs are retained.

### Scraping Approach
The content of the filtered URLs has been scraped using various libraries, depending on the source:
- `pypdf2` for PDF URLs.
- `wikipediaapi` for Wikipedia pages.
- `requests`, `readability-lxml`, and `html2text` for most other sources.
- `requests`, `beautifulsoup`, and `html2text` for BBC links.
"""


with gr.Blocks() as demo:
    gr.Markdown("# Olas Predict Benchmark")
    gr.Markdown("Leaderboard showing the performance of Olas Predict tools on the Autocast dataset and overview of the project.")
    gr.DataFrame(df)
    gr.Markdown(markdown_text)

demo.launch()