metadata

title: BrowserGym Leaderboard
emoji: 🏆
colorFrom: purple
colorTo: green
sdk: docker
pinned: false
license: mit

BrowserGym Leaderboard

This leaderboard tracks performance of various agents on web navigation tasks.

How to Submit Results for New Agents

1. Create Results Directory

Create a new folder in the results directory with your agent's name:

results/
└── your-agent-name/
    ├── README.md
    ├── webarena.json
    ├── workarena-l1.json
    ├── workarena++-l2.json
    ├── workarena++-l3.json
    └── miniwob.json

2. Add Agent Details

Create a README.md in your agent's folder with the following details:

Required Information

Model Name: Base model used (e.g., GPT-4, Claude-2)
Model Architecture: Architecture details and any modifications
Input/Output Format: How inputs are processed and outputs generated
Training Details: Training configuration if applicable
- Dataset used
- Number of training steps
- Hardware used
- Training time

Optional Information

Paper Link: Link to published paper/preprint if available
Code Repository: Link to public code implementation
Additional Notes: Any special configurations or requirements
License: License information for your agent

Make sure to organize the information in clear sections using Markdown.

3. Add Benchmark Results

Create separate JSON files for each benchmark following this format:

[
    {
        "agent_name": "your-agent-name",
        "study_id": "unique-study-identifier-from-agentlab", 
        "date_time": "YYYY-MM-DD HH:MM:SS",
        "benchmark": "WebArena",
        "score": 0.0,
        "std_err": 0.0,
        "benchmark_specific": "Yes/No",
        "benchmark_tuned": "Yes/No",
        "followed_evaluation_protocol": "Yes/No", 
        "reproducible": "Yes/No",
        "comments": "Additional details",
        "original_or_reproduced": "Original"
    }
]

Please add all the benchmark files in separate json files named as follows:

webarena.json
workarena-l1.json
workarena-l2.json
workarena-l3.json
miniwob.json

Each file must contain a JSON array with a single object following the format above. The benchmark field in each file must match the benchmark name exactly ([WebArena, WorkArena-L1, WorkArena-L2, WorkArena-L3, MiniWoB]) and benchmark_lowercase.json as the filename.

4. Submit PR

Open the community tab and press "New Pull Request"
Give it a new title to the PR and follow the steps mentioned
Publish the branch

How to Submit Reproducibility Results for Existing Agents

Open the results file for the agent and benchmark you reproduced the results for.

1. Add reproduced results

Append the following entry in the json file. Ensure you set original_or_reproduced as Reproduced.

[
    {
        "agent_name": "your-agent-name",
        "study_id": "unique-study-identifier-from-agentlab", 
        "date_time": "YYYY-MM-DD HH:MM:SS",
        "benchmark": "WebArena",
        "score": 0.0,
        "std_err": 0.0,
        "benchmark_specific": "Yes/No",
        "benchmark_tuned": "Yes/No",
        "followed_evaluation_protocol": "Yes/No", 
        "reproducible": "Yes/No",
        "comments": "Additional details",
        "original_or_reproduced": "Reproduced"
    }
]

2. Submit PR

Open the community tab and press "New Pull Request"
Give it a new title to the PR and follow the steps mentioned
Publish the branch

License

MIT