--- title: BrowserGym Leaderboard emoji: 🏆 colorFrom: purple colorTo: green sdk: docker pinned: false license: mit --- # BrowserGym Leaderboard This leaderboard tracks performance of various agents on web navigation tasks. ## How to Submit Results for New Agents ### 1. Create Results Directory Create a new folder in the `results` directory with your agent's name: ```bash results/ └── your-agent-name/ ├── README.md ├── webarena.json ├── workarena-l1.json ├── workarena++-l2.json ├── workarena++-l3.json └── miniwob.json ``` ### 2. Add Agent Details Create a `README.md` in your agent's folder with the following details: #### Required Information - **Model Name**: Base model used (e.g., GPT-4, Claude-2) - **Model Architecture**: Architecture details and any modifications - **Input/Output Format**: How inputs are processed and outputs generated - **Training Details**: Training configuration if applicable - Dataset used - Number of training steps - Hardware used - Training time #### Optional Information - **Paper Link**: Link to published paper/preprint if available - **Code Repository**: Link to public code implementation - **Additional Notes**: Any special configurations or requirements - **License**: License information for your agent Make sure to organize the information in clear sections using Markdown. ### 3. Add Benchmark Results Create separate JSON files for each benchmark following this format: ```json [ { "agent_name": "your-agent-name", "study_id": "unique-study-identifier-from-agentlab", "date_time": "YYYY-MM-DD HH:MM:SS", "benchmark": "WebArena", "score": 0.0, "std_err": 0.0, "benchmark_specific": "Yes/No", "benchmark_tuned": "Yes/No", "followed_evaluation_protocol": "Yes/No", "reproducible": "Yes/No", "comments": "Additional details", "original_or_reproduced": "Original" } ] ``` Please add all the benchmark files in separate json files named as follows: - `webarena.json` - `workarena-l1.json` - `workarena-l2.json` - `workarena-l3.json` - `miniwob.json` Each file must contain a JSON array with a single object following the format above. The benchmark field in each file must match the benchmark name exactly ([`WebArena`, `WorkArena-L1`, `WorkArena-L2`, `WorkArena-L3`, `MiniWoB`]) and benchmark_lowercase.json as the filename. ### 4. Submit PR 1. Open the community tab and press "New Pull Request" 2. Give it a new title to the PR and follow the steps mentioned 3. Publish the branch ## How to Submit Reproducibility Results for Existing Agents Open the results file for the agent and benchmark you reproduced the results for. ### 1. Add reproduced results Append the following entry in the json file. Ensure you set `original_or_reproduced` as `Reproduced`. ```json [ { "agent_name": "your-agent-name", "study_id": "unique-study-identifier-from-agentlab", "date_time": "YYYY-MM-DD HH:MM:SS", "benchmark": "WebArena", "score": 0.0, "std_err": 0.0, "benchmark_specific": "Yes/No", "benchmark_tuned": "Yes/No", "followed_evaluation_protocol": "Yes/No", "reproducible": "Yes/No", "comments": "Additional details", "original_or_reproduced": "Reproduced" } ] ``` ### 2. Submit PR 1. Open the community tab and press "New Pull Request" 2. Give it a new title to the PR and follow the steps mentioned 3. Publish the branch ## License MIT