title: BrowserGym Leaderboard
emoji: π
colorFrom: purple
colorTo: green
sdk: docker
pinned: false
license: mit
BrowserGym Leaderboard
This leaderboard tracks performance of various agents on web navigation tasks.
How to Submit Results for New Agents
1. Create Results Directory
Create a new folder in the results
directory with your agent's name:
results/
βββ your-agent-name/
βββ README.md
βββ webarena.json
βββ workarena-l1.json
βββ workarena++-l2.json
βββ workarena++-l3.json
βββ miniwob.json
2. Add Agent Details
Create a README.md
in your agent's folder with the following details:
Required Information
- Model Name: Base model used (e.g., GPT-4, Claude-2)
- Model Architecture: Architecture details and any modifications
- Input/Output Format: How inputs are processed and outputs generated
- Training Details: Training configuration if applicable
- Dataset used
- Number of training steps
- Hardware used
- Training time
Optional Information
- Paper Link: Link to published paper/preprint if available
- Code Repository: Link to public code implementation
- Additional Notes: Any special configurations or requirements
- License: License information for your agent
Make sure to organize the information in clear sections using Markdown.
3. Add Benchmark Results
Create separate JSON files for each benchmark following this format:
[
{
"agent_name": "your-agent-name",
"study_id": "unique-study-identifier-from-agentlab",
"date_time": "YYYY-MM-DD HH:MM:SS",
"benchmark": "WebArena",
"score": 0.0,
"std_err": 0.0,
"benchmark_specific": "Yes/No",
"benchmark_tuned": "Yes/No",
"followed_evaluation_protocol": "Yes/No",
"reproducible": "Yes/No",
"comments": "Additional details",
"original_or_reproduced": "Original"
}
]
Please add all the benchmark files in separate json files named as follows:
webarena.json
workarena-l1.json
workarena-l2.json
workarena-l3.json
miniwob.json
Each file must contain a JSON array with a single object following the format above. The benchmark field in each file must match the benchmark name exactly ([WebArena
, WorkArena-L1
, WorkArena-L2
, WorkArena-L3
, MiniWoB
]) and benchmark_lowercase.json as the filename.
4. Submit PR
- Open the community tab and press "New Pull Request"
- Give it a new title to the PR and follow the steps mentioned
- Publish the branch
How to Submit Reproducibility Results for Existing Agents
Open the results file for the agent and benchmark you reproduced the results for.
1. Add reproduced results
Append the following entry in the json file. Ensure you set original_or_reproduced
as Reproduced
.
[
{
"agent_name": "your-agent-name",
"study_id": "unique-study-identifier-from-agentlab",
"date_time": "YYYY-MM-DD HH:MM:SS",
"benchmark": "WebArena",
"score": 0.0,
"std_err": 0.0,
"benchmark_specific": "Yes/No",
"benchmark_tuned": "Yes/No",
"followed_evaluation_protocol": "Yes/No",
"reproducible": "Yes/No",
"comments": "Additional details",
"original_or_reproduced": "Reproduced"
}
]
2. Submit PR
- Open the community tab and press "New Pull Request"
- Give it a new title to the PR and follow the steps mentioned
- Publish the branch
License
MIT