Spaces:
Running
Running
Yiqiao Jin
commited on
Commit
β’
57d59fc
1
Parent(s):
53709ed
Update demo and README
Browse files- README.md +53 -25
- agentreview/const.py +10 -0
- agentreview/ui/cli.py +1 -11
- demo.py +0 -217
- notebooks/barplot_similarity_between_review_metareview.ipynb +0 -0
- notebooks/demo.ipynb +0 -0
- notebooks/histplots.ipynb +0 -0
- notebooks/lineplots.ipynb +0 -0
- run.sh +12 -0
- run_paper_decision_cli.py +2 -26
- run_paper_review_cli.py +2 -59
README.md
CHANGED
@@ -13,11 +13,10 @@ short_description: EMNLP 2024
|
|
13 |
|
14 |
# AgentReview
|
15 |
|
16 |
-
Official implementation for the π[EMNLP 2024](https://2024.emnlp.org/) (
|
17 |
|
18 |
* π Website: [https://agentreview.github.io/](https://agentreview.github.io/)
|
19 |
* π Paper: [https://arxiv.org/abs/2406.12708](https://arxiv.org/abs/2406.12708)
|
20 |
-
* **π Note: This repository is under construction development. Please stay tuned!!**
|
21 |
|
22 |
|
23 |
|
@@ -34,49 +33,79 @@ Official implementation for the π[EMNLP 2024](https://2024.emnlp.org/) (main)
|
|
34 |
|
35 |
---
|
36 |
|
37 |
-
|
38 |
|
39 |
-
|
40 |
|
41 |
### Academic Abstract
|
42 |
|
43 |
-
Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data, which do not adequately address the multivariate nature of the process, account for the latent variables, and are further constrained by privacy concerns due to the sensitive nature of the data. We introduce AgentReview, the first large language model (LLM) based peer review simulation
|
|
|
44 |
|
45 |
|
46 |
![Review Stage Design](static/img/ReviewPipeline.png)
|
47 |
|
|
|
48 |
|
49 |
### Installation
|
50 |
|
51 |
-
|
52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
```
|
54 |
-
cd AgentReview
|
55 |
pip install -r requirements.txt
|
56 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
|
59 |
|
60 |
-
|
61 |
-
|
|
|
|
|
|
|
62 |
|
63 |
-
|
64 |
|
65 |
-
|
66 |
-
- `app.py`: The main application file for running the framework.
|
67 |
-
- `analysis/`: Contains Python scripts for various statistical analyses of review data.
|
68 |
-
- `chatarena/`: Core module for simulating different review environments and integrating LLM backends.
|
69 |
-
- `dataset/`: Scripts for handling dataset operations, such as downloading and processing submissions.
|
70 |
-
- `demo/`: Demonstrative scripts showcasing the functionality of different components.
|
71 |
-
- `docs/`: Documentation files and markdown guides for using and extending the framework.
|
72 |
-
- `examples/`: Configuration files and examples to demonstrate the capabilities and setup of simulations.
|
73 |
-
- `experiments/`: Experimental scripts to test new ideas or improvements on the framework.
|
74 |
-
- `visual/`: Visualization scripts for generating insightful plots and charts from the simulation data.
|
75 |
|
76 |
-
|
77 |
|
78 |
-
**[UNDER CONSTRUCTION]**
|
79 |
|
|
|
80 |
|
81 |
### Stage Design
|
82 |
|
@@ -94,7 +123,6 @@ Our simulation adopts a structured, 5-phase pipeline
|
|
94 |
- Sometimes the API can apply strict filtering to the request. You may need to adjust the content filtering to get the desired results.
|
95 |
|
96 |
|
97 |
-
|
98 |
## License
|
99 |
|
100 |
This project is licensed under the Apache-2.0 License.
|
|
|
13 |
|
14 |
# AgentReview
|
15 |
|
16 |
+
Official implementation for the π[EMNLP 2024](https://2024.emnlp.org/) main track (Oral) paper -- [AgentReview: Exploring Peer Review Dynamics with LLM Agents](https://arxiv.org/abs/2406.12708)
|
17 |
|
18 |
* π Website: [https://agentreview.github.io/](https://agentreview.github.io/)
|
19 |
* π Paper: [https://arxiv.org/abs/2406.12708](https://arxiv.org/abs/2406.12708)
|
|
|
20 |
|
21 |
|
22 |
|
|
|
33 |
|
34 |
---
|
35 |
|
36 |
+
## Introduction
|
37 |
|
38 |
+
AgentReview is a pioneering large language model (LLM)-based framework for simulating peer review processes, developed to analyze and address the complex, multivariate factors influencing review outcomes. Unlike traditional statistical methods, AgentReview captures latent variables while respecting the privacy of sensitive peer review data.
|
39 |
|
40 |
### Academic Abstract
|
41 |
|
42 |
+
Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data, which do not adequately address the multivariate nature of the process, account for the latent variables, and are further constrained by privacy concerns due to the sensitive nature of the data. We introduce AgentReview, the first large language model (LLM) based peer review simulation
|
43 |
+
framework, which effectively disentangles the impacts of multiple latent factors and addresses the privacy issue. Our study reveals significant insights, including a notable 37.1% variation in paper decisions due to reviewers' biases, supported by sociological theories such as the social influence theory, altruism fatigue, and authority bias. We believe that this study could offer valuable insights to improve the design of peer review mechanisms.
|
44 |
|
45 |
|
46 |
![Review Stage Design](static/img/ReviewPipeline.png)
|
47 |
|
48 |
+
## Getting Started
|
49 |
|
50 |
### Installation
|
51 |
|
52 |
+
**Download the data**
|
53 |
+
|
54 |
+
Download both zip files in the [Dropbox](https://www.dropbox.com/scl/fo/etzu5h8kwrx8vrcaep9tt/ALCnxFt2cT9aF477d-h1-E8?rlkey=9r5ep9psp8u4yaxxo9caf5nnc&st=k946oui5&dl=0):
|
55 |
+
|
56 |
+
Unzip [AgentReview_Paper_Data.zip](https://www.dropbox.com/scl/fi/l17brtbzsy3xwflqd58ja/AgentReview_Paper_Data.zip?rlkey=vldiexmgzi7zycmz7pumgbooc&st=b6g3nkry&dl=0) under `data/`, which contains:
|
57 |
+
1. The PDF versions of the paper
|
58 |
+
2. The real-world peer review for ICLR 2020 - 2023
|
59 |
+
|
60 |
+
```bash
|
61 |
+
unzip AgentReview_Paper_Data.zip -d data/
|
62 |
+
```
|
63 |
+
|
64 |
+
(Optional) Unzip [AgentReview_LLM_Reviews.zip](https://www.dropbox.com/scl/fi/ckr0hpxyedx8u9s6235y6/AgentReview_LLM_Reviews.zip?rlkey=cgexir5xu38tm79eiph8ulbkq&st=q23x2trr&dl=0) under `outputs/`, which contains:
|
65 |
+
1. The LLM-generated reviews, (our LLM-generated dataset)
|
66 |
+
|
67 |
+
```bash
|
68 |
+
unzip AgentReview_LLM_Review.zip -d outputs/
|
69 |
+
```
|
70 |
+
|
71 |
+
**Install Required Packages**:
|
72 |
```
|
73 |
+
cd AgentReview/
|
74 |
pip install -r requirements.txt
|
75 |
```
|
76 |
+
|
77 |
+
3. Set environment variables
|
78 |
+
|
79 |
+
If you use OpenAI API, set OPENAI_API_KEY.
|
80 |
+
|
81 |
+
```bash
|
82 |
+
export OPENAI_API_KEY=... # Format: sk-...
|
83 |
+
```
|
84 |
+
|
85 |
+
If you use AzureOpenAI API, set the following
|
86 |
+
|
87 |
+
```bash
|
88 |
+
export AZURE_ENDPOINT=... # Format: https://<your-endpoint>.openai.azure.com/
|
89 |
+
export AZURE_DEPLOYMENT=... # Your Azure OpenAI deployment here
|
90 |
+
export AZURE_OPENAI_KEY=... # Your Azure OpenAI key here
|
91 |
+
```
|
92 |
|
93 |
+
**Running the Project**
|
94 |
|
95 |
+
Set the environment variables in `run.sh` and run it:
|
96 |
+
|
97 |
+
```bash
|
98 |
+
bash run.sh
|
99 |
+
```
|
100 |
|
101 |
+
**Note: all project files should be run from the `AgentReview` directory.**
|
102 |
|
103 |
+
**Demo**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
|
105 |
+
A demo can be found in `notebooks/demo.ipynb`
|
106 |
|
|
|
107 |
|
108 |
+
## Framework Overview
|
109 |
|
110 |
### Stage Design
|
111 |
|
|
|
123 |
- Sometimes the API can apply strict filtering to the request. You may need to adjust the content filtering to get the desired results.
|
124 |
|
125 |
|
|
|
126 |
## License
|
127 |
|
128 |
This project is licensed under the Apache-2.0 License.
|
agentreview/const.py
CHANGED
@@ -100,3 +100,13 @@ year2paper_ids = {
|
|
100 |
"ICLR2024": [39, 247, 289, 400, 489, 742, 749] + [62, 78, 159, 161, 170, 192, 198, 215, 219, 335, 344, 386, 427, 432, 448, 451, 461, 472, 485, 536, 546, 559, 573, 577, 597] + [5, 9, 11, 19, 20, 30, 31, 32, 40, 49, 52, 53, 54, 56, 61, 66, 67, 73, 74, 77, 85, 87, 100, 104, 114, 116, 124, 130, 133, 138, 145, 151, 153, 156, 165, 166, 172, 181, 183, 187, 195, 204, 212, 221, 224, 230, 237, 243, 248, 257, 258, 259, 263, 272, 278, 287, 288, 291, 292, 298, 300, 302, 304, 306, 308, 318, 320, 321, 324, 325, 326, 327, 331, 332, 334, 336, 338, 340, 345, 349, 350, 356, 357, 358, 360] + [1, 2, 12, 14, 24, 26, 33, 35, 36, 41, 42, 44, 50, 51, 55, 57, 59, 70, 72, 75, 76, 81, 89, 90, 93,
|
101 |
94, 97, 99, 101, 105, 110, 111, 112, 117, 119, 120, 125, 128, 129, 131, 134, 135, 140, 148, 150, 157, 158, 163, 167, 173, 175, 177, 182, 185, 186, 188, 189, 197, 202, 207, 209, 210, 214, 216, 226, 231, 234, 236, 238, 239, 241, 244, 245, 249, 260, 262, 264, 265, 271, 276, 277, 279, 281, 282, 284, 286, 290, 294, 295, 301, 303, 307, 309, 313, 315, 319, 322, 333, 337, 339, 342, 354, 363, 364, 369, 373, 374, 375, 377, 378, 381, 382, 385, 388, 398, 399, 401, 407, 412, 413, 415, 416, 417, 420, 421, 422, 426, 428, 436, 437, 444, 446, 449, 453, 454, 463, 464, 469, 478, 480, 487, 490, 496, 498, 501, 502, 504, 506, 513, 516, 517, 518, 520, 521, 523, 524, 525, 537, 541, 545, 551, 552, 554, 555, 558, 562, 563, 574, 575, 579, 581, 584, 588, 595, 596, 598, 607, 608, 615, 622, 624, 625, 627, 629, 630, 634, 636, 641, 645, 647, 648, 651, 652, 654, 655, 662, 667, 668, 671, 672, 673, 681, 682, 685, 689, 690, 691, 697, 698, 701]
|
102 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 |
"ICLR2024": [39, 247, 289, 400, 489, 742, 749] + [62, 78, 159, 161, 170, 192, 198, 215, 219, 335, 344, 386, 427, 432, 448, 451, 461, 472, 485, 536, 546, 559, 573, 577, 597] + [5, 9, 11, 19, 20, 30, 31, 32, 40, 49, 52, 53, 54, 56, 61, 66, 67, 73, 74, 77, 85, 87, 100, 104, 114, 116, 124, 130, 133, 138, 145, 151, 153, 156, 165, 166, 172, 181, 183, 187, 195, 204, 212, 221, 224, 230, 237, 243, 248, 257, 258, 259, 263, 272, 278, 287, 288, 291, 292, 298, 300, 302, 304, 306, 308, 318, 320, 321, 324, 325, 326, 327, 331, 332, 334, 336, 338, 340, 345, 349, 350, 356, 357, 358, 360] + [1, 2, 12, 14, 24, 26, 33, 35, 36, 41, 42, 44, 50, 51, 55, 57, 59, 70, 72, 75, 76, 81, 89, 90, 93,
|
101 |
94, 97, 99, 101, 105, 110, 111, 112, 117, 119, 120, 125, 128, 129, 131, 134, 135, 140, 148, 150, 157, 158, 163, 167, 173, 175, 177, 182, 185, 186, 188, 189, 197, 202, 207, 209, 210, 214, 216, 226, 231, 234, 236, 238, 239, 241, 244, 245, 249, 260, 262, 264, 265, 271, 276, 277, 279, 281, 282, 284, 286, 290, 294, 295, 301, 303, 307, 309, 313, 315, 319, 322, 333, 337, 339, 342, 354, 363, 364, 369, 373, 374, 375, 377, 378, 381, 382, 385, 388, 398, 399, 401, 407, 412, 413, 415, 416, 417, 420, 421, 422, 426, 428, 436, 437, 444, 446, 449, 453, 454, 463, 464, 469, 478, 480, 487, 490, 496, 498, 501, 502, 504, 506, 513, 516, 517, 518, 520, 521, 523, 524, 525, 537, 541, 545, 551, 552, 554, 555, 558, 562, 563, 574, 575, 579, 581, 584, 588, 595, 596, 598, 607, 608, 615, 622, 624, 625, 627, 629, 630, 634, 636, 641, 645, 647, 648, 651, 652, 654, 655, 662, 667, 668, 671, 672, 673, 681, 682, 685, 689, 690, 691, 697, 698, 701]
|
102 |
}
|
103 |
+
AGENTREVIEW_LOGO = r"""
|
104 |
+
_ _____ _
|
105 |
+
/\ | | | __ \ (_)
|
106 |
+
/ \ __ _ ___ _ __ | |_| |__) |_____ ___ _____ __
|
107 |
+
/ /\ \ / _` |/ _ \ '_ \| __| _ // _ \ \ / / |/ _ \ \ /\ / /
|
108 |
+
/ ____ \ (_| | __/ | | | |_| | \ \ __/\ V /| | __/\ V V /
|
109 |
+
/_/ \_\__, |\___|_| |_|\__|_| \_\___| \_/ |_|\___| \_/\_/
|
110 |
+
__/ |
|
111 |
+
|___/
|
112 |
+
"""
|
agentreview/ui/cli.py
CHANGED
@@ -14,19 +14,10 @@ from agentreview.utility.utils import get_rebuttal_dir, load_llm_ac_decisions, \
|
|
14 |
save_llm_ac_decisions
|
15 |
from ..arena import Arena, TooManyInvalidActions
|
16 |
from ..backends.human import HumanBackendError
|
|
|
17 |
from ..environments import PaperReview, PaperDecision
|
18 |
|
19 |
# Get the ASCII art from https://patorjk.com/software/taag/#p=display&f=Big&t=Chat%20Arena
|
20 |
-
ASCII_ART = r"""
|
21 |
-
_ _____ _
|
22 |
-
/\ | | | __ \ (_)
|
23 |
-
/ \ __ _ ___ _ __ | |_| |__) |_____ ___ _____ __
|
24 |
-
/ /\ \ / _` |/ _ \ '_ \| __| _ // _ \ \ / / |/ _ \ \ /\ / /
|
25 |
-
/ ____ \ (_| | __/ | | | |_| | \ \ __/\ V /| | __/\ V V /
|
26 |
-
/_/ \_\__, |\___|_| |_|\__|_| \_\___| \_/ |_|\___| \_/\_/
|
27 |
-
__/ |
|
28 |
-
|___/
|
29 |
-
"""
|
30 |
|
31 |
color_dict = {
|
32 |
"red": Fore.RED,
|
@@ -87,7 +78,6 @@ class ArenaCLI:
|
|
87 |
|
88 |
console = Console()
|
89 |
# Print ascii art
|
90 |
-
console.print(ASCII_ART, style="bold dark_orange3")
|
91 |
timestep = self.arena.reset()
|
92 |
console.print("πAgentReview Initialized!", style="bold green")
|
93 |
|
|
|
14 |
save_llm_ac_decisions
|
15 |
from ..arena import Arena, TooManyInvalidActions
|
16 |
from ..backends.human import HumanBackendError
|
17 |
+
from ..const import AGENTREVIEW_LOGO
|
18 |
from ..environments import PaperReview, PaperDecision
|
19 |
|
20 |
# Get the ASCII art from https://patorjk.com/software/taag/#p=display&f=Big&t=Chat%20Arena
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
color_dict = {
|
23 |
"red": Fore.RED,
|
|
|
78 |
|
79 |
console = Console()
|
80 |
# Print ascii art
|
|
|
81 |
timestep = self.arena.reset()
|
82 |
console.print("πAgentReview Initialized!", style="bold green")
|
83 |
|
demo.py
DELETED
@@ -1,217 +0,0 @@
|
|
1 |
-
#!/usr/bin/env python
|
2 |
-
# coding: utf-8
|
3 |
-
|
4 |
-
# # AgentReview
|
5 |
-
#
|
6 |
-
#
|
7 |
-
#
|
8 |
-
# In this tutorial, you will explore customizing the AgentReview experiment.
|
9 |
-
#
|
10 |
-
# π Venue: EMNLP 2024 (Oral)
|
11 |
-
#
|
12 |
-
# π arXiv: [https://arxiv.org/abs/2406.12708](https://arxiv.org/abs/2406.12708)
|
13 |
-
#
|
14 |
-
# π Website: [https://agentreview.github.io/](https://agentreview.github.io/)
|
15 |
-
#
|
16 |
-
# ```bibtex
|
17 |
-
# @inproceedings{jin2024agentreview,
|
18 |
-
# title={AgentReview: Exploring Peer Review Dynamics with LLM Agents},
|
19 |
-
# author={Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong},
|
20 |
-
# booktitle={EMNLP},
|
21 |
-
# year={2024}
|
22 |
-
# }
|
23 |
-
# ```
|
24 |
-
#
|
25 |
-
|
26 |
-
# In[2]:
|
27 |
-
|
28 |
-
|
29 |
-
import os
|
30 |
-
|
31 |
-
import numpy as np
|
32 |
-
|
33 |
-
from agentreview import const
|
34 |
-
|
35 |
-
os.environ["OPENAI_API_VERSION"] = "2024-06-01-preview"
|
36 |
-
|
37 |
-
|
38 |
-
# ## Overview
|
39 |
-
#
|
40 |
-
# AgentReview features a range of customizable variables, such as characteristics of reviewers, authors, area chairs (ACs), as well as the reviewing mechanisms
|
41 |
-
|
42 |
-
# In[3]:
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
# ## Review Pipeline
|
47 |
-
#
|
48 |
-
# The simulation adopts a structured, 5-phase pipeline (Section 2 in the [paper](https://arxiv.org/abs/2406.12708)):
|
49 |
-
#
|
50 |
-
# * **I. Reviewer Assessment.** Each manuscript is evaluated by three reviewers independently.
|
51 |
-
# * **II. Author-Reviewer Discussion.** Authors submit rebuttals to address reviewers' concerns;
|
52 |
-
# * **III. Reviewer-AC Discussion.** The AC facilitates discussions among reviewers, prompting updates to their initial assessments.
|
53 |
-
# * **IV. Meta-Review Compilation.** The AC synthesizes the discussions into a meta-review.
|
54 |
-
# * **V. Paper Decision.** The AC makes the final decision on whether to accept or reject the paper, based on all gathered inputs.
|
55 |
-
|
56 |
-
# In[2]:
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
# In[4]:
|
61 |
-
|
62 |
-
|
63 |
-
import os
|
64 |
-
|
65 |
-
if os.path.basename(os.getcwd()) == "notebooks":
|
66 |
-
os.chdir("..")
|
67 |
-
# Change the working directory to AgentReview
|
68 |
-
print(f"Changing the current working directory to {os.path.basename(os.getcwd())}")
|
69 |
-
|
70 |
-
|
71 |
-
# In[5]:
|
72 |
-
|
73 |
-
|
74 |
-
from argparse import Namespace
|
75 |
-
|
76 |
-
args = Namespace(openai_key=None,
|
77 |
-
deployment=None,
|
78 |
-
openai_client_type='azure_openai',
|
79 |
-
endpoint=None,
|
80 |
-
api_version='2023-05-15',
|
81 |
-
ac_scoring_method='ranking',
|
82 |
-
conference='ICLR2024',
|
83 |
-
num_reviewers_per_paper=3,
|
84 |
-
ignore_missing_metareviews=False,
|
85 |
-
overwrite=False,
|
86 |
-
num_papers_per_area_chair=10,
|
87 |
-
model_name='gpt-4o',
|
88 |
-
output_dir='outputs',
|
89 |
-
max_num_words=16384,
|
90 |
-
visual_dir='outputs/visual',
|
91 |
-
device='cuda',
|
92 |
-
data_dir='./data', # Directory to all paper PDF
|
93 |
-
acceptance_rate=0.32,
|
94 |
-
task='paper_review')
|
95 |
-
|
96 |
-
os.environ['OPENAI_API_VERSION'] = args.api_version
|
97 |
-
|
98 |
-
# In[13]:
|
99 |
-
|
100 |
-
|
101 |
-
malicious_Rx1_setting = {
|
102 |
-
"AC": [
|
103 |
-
"BASELINE"
|
104 |
-
],
|
105 |
-
|
106 |
-
"reviewer": [
|
107 |
-
"malicious",
|
108 |
-
"BASELINE",
|
109 |
-
"BASELINE"
|
110 |
-
],
|
111 |
-
|
112 |
-
"author": [
|
113 |
-
"BASELINE"
|
114 |
-
],
|
115 |
-
"global_settings":{
|
116 |
-
"provides_numeric_rating": ['reviewer', 'ac'],
|
117 |
-
"persons_aware_of_authors_identities": []
|
118 |
-
}
|
119 |
-
}
|
120 |
-
|
121 |
-
all_settings = {"malicious_Rx1": malicious_Rx1_setting}
|
122 |
-
args.experiment_name = "malicious_Rx1_setting"
|
123 |
-
|
124 |
-
|
125 |
-
#
|
126 |
-
# `malicious_Rx1` means 1 reviewer is a malicious reviewer, and the other reviewers are default (i.e. `BASELINE`) reviewers.
|
127 |
-
#
|
128 |
-
#
|
129 |
-
|
130 |
-
# ## Reviews
|
131 |
-
#
|
132 |
-
# Define the review pipeline
|
133 |
-
|
134 |
-
# In[10]:
|
135 |
-
|
136 |
-
|
137 |
-
from agentreview.environments import PaperReview
|
138 |
-
|
139 |
-
def review_one_paper(paper_id, setting):
|
140 |
-
paper_decision = paper_id2decision[paper_id]
|
141 |
-
|
142 |
-
experiment_setting = get_experiment_settings(paper_id=paper_id,
|
143 |
-
paper_decision=paper_decision,
|
144 |
-
setting=setting)
|
145 |
-
print(f"Paper ID: {paper_id} (Decision in {args.conference}: {paper_decision})")
|
146 |
-
|
147 |
-
players = initialize_players(experiment_setting=experiment_setting, args=args)
|
148 |
-
|
149 |
-
player_names = [player.name for player in players]
|
150 |
-
|
151 |
-
env = PaperReview(player_names=player_names, paper_decision=paper_decision, paper_id=paper_id,
|
152 |
-
args=args, experiment_setting=experiment_setting)
|
153 |
-
|
154 |
-
arena = PaperReviewArena(players=players, environment=env, args=args)
|
155 |
-
arena.launch_cli(interactive=False)
|
156 |
-
|
157 |
-
|
158 |
-
# In[11]:
|
159 |
-
|
160 |
-
|
161 |
-
import os
|
162 |
-
import sys
|
163 |
-
|
164 |
-
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "agentreview")))
|
165 |
-
|
166 |
-
from agentreview.paper_review_settings import get_experiment_settings
|
167 |
-
from agentreview.paper_review_arena import PaperReviewArena
|
168 |
-
from agentreview.utility.experiment_utils import initialize_players
|
169 |
-
from agentreview.utility.utils import project_setup, get_paper_decision_mapping
|
170 |
-
|
171 |
-
|
172 |
-
# In[14]:
|
173 |
-
|
174 |
-
|
175 |
-
sampled_paper_ids = [39]
|
176 |
-
|
177 |
-
paper_id2decision, paper_decision2ids = get_paper_decision_mapping(args.data_dir, args.conference)
|
178 |
-
|
179 |
-
for paper_id in sampled_paper_ids:
|
180 |
-
review_one_paper(paper_id, malicious_Rx1_setting)
|
181 |
-
|
182 |
-
|
183 |
-
|
184 |
-
def run_paper_decision():
|
185 |
-
args.task = "paper_decision"
|
186 |
-
|
187 |
-
# Make sure the same set of papers always go through the same AC no matter which setting we choose
|
188 |
-
NUM_PAPERS = len(const.year2paper_ids[args.conference])
|
189 |
-
order = np.random.choice(range(NUM_PAPERS), size=NUM_PAPERS, replace=False)
|
190 |
-
|
191 |
-
|
192 |
-
# Paper IDs we actually used in experiments
|
193 |
-
experimental_paper_ids = []
|
194 |
-
|
195 |
-
# For papers that have not been decided yet, load their metareviews
|
196 |
-
metareviews = []
|
197 |
-
print("Shuffling paper IDs")
|
198 |
-
sampled_paper_ids = np.array(const.year2paper_ids[args.conference])[order]
|
199 |
-
|
200 |
-
# Exclude papers that already have AC decisions
|
201 |
-
existing_ac_decisions = load_llm_ac_decisions(output_dir=args.output_dir,
|
202 |
-
conference=args.conference,
|
203 |
-
model_name=args.model_name,
|
204 |
-
ac_scoring_method=args.ac_scoring_method,
|
205 |
-
experiment_name=args.experiment_name,
|
206 |
-
num_papers_per_area_chair=args.num_papers_per_area_chair)
|
207 |
-
|
208 |
-
sampled_paper_ids = [paper_id for paper_id in sampled_paper_ids if paper_id not in existing_ac_decisions]
|
209 |
-
|
210 |
-
|
211 |
-
|
212 |
-
|
213 |
-
# In[ ]:
|
214 |
-
|
215 |
-
|
216 |
-
|
217 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
notebooks/barplot_similarity_between_review_metareview.ipynb
DELETED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/demo.ipynb
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/histplots.ipynb
DELETED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/lineplots.ipynb
DELETED
The diff for this file is too large to render.
See raw diff
|
|
run.sh
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# If you use OpenAI API, you need to set OPENAI_API_KEY.
|
2 |
+
export OPENAI_API_KEY=...
|
3 |
+
|
4 |
+
# If you use AzureOpenAI API, you need to set the following
|
5 |
+
export AZURE_ENDPOINT=... # Format: https://<your-endpoint>.openai.azure.com/
|
6 |
+
export AZURE_DEPLOYMENT=... # Your Azure OpenAI deployment here
|
7 |
+
export AZURE_OPENAI_KEY=... # Your Azure OpenAI key here
|
8 |
+
|
9 |
+
python run_paper_review_cli.py --conference ICLR2024 \
|
10 |
+
--openai_client_type azure_openai \
|
11 |
+
--data_dir data \
|
12 |
+
--experiment_name malicious_Rx1
|
run_paper_decision_cli.py
CHANGED
@@ -39,6 +39,8 @@ def main(args):
|
|
39 |
"""
|
40 |
args.task = "paper_decision"
|
41 |
|
|
|
|
|
42 |
# Sample Paper IDs from each category
|
43 |
paper_id2decision, paper_decision2ids = get_paper_decision_mapping(args.data_dir, args.conference)
|
44 |
|
@@ -98,31 +100,6 @@ def main(args):
|
|
98 |
|
99 |
players = initialize_players(experiment_setting=experiment_setting, args=args)
|
100 |
|
101 |
-
# players = []
|
102 |
-
#
|
103 |
-
# for role, players_li in experiment_setting["players"].items():
|
104 |
-
#
|
105 |
-
# for i, player_config in enumerate(players_li):
|
106 |
-
#
|
107 |
-
# # This phase should only contain the Area Chair
|
108 |
-
# if role == "AC":
|
109 |
-
#
|
110 |
-
# player_config = get_ac_config(env_type="paper_decision",
|
111 |
-
# scoring_method=args.ac_scoring_method,
|
112 |
-
# num_papers_per_area_chair=args.num_papers_per_area_chair,
|
113 |
-
# global_settings=experiment_setting['global_settings'],
|
114 |
-
# acceptance_rate=args.acceptance_rate
|
115 |
-
# ** player_config)
|
116 |
-
#
|
117 |
-
# # player_config = AgentConfig(**player_config)
|
118 |
-
# player_config['model'] = args.model_name
|
119 |
-
# player = AreaChair(**player_config)
|
120 |
-
#
|
121 |
-
# else:
|
122 |
-
# raise NotImplementedError(f"Unknown role: {role}")
|
123 |
-
#
|
124 |
-
# players.append(player)
|
125 |
-
|
126 |
player_names = [player.name for player in players]
|
127 |
|
128 |
if batch_index >= num_batches - 1: # Last batch. Include all remaining papers
|
@@ -139,7 +116,6 @@ def main(args):
|
|
139 |
arena = PaperReviewArena(players=players, environment=env, args=args, global_prompt=const.GLOBAL_PROMPT)
|
140 |
arena.launch_cli(interactive=False)
|
141 |
|
142 |
-
|
143 |
if __name__ == "__main__":
|
144 |
project_setup()
|
145 |
main(parse_args())
|
|
|
39 |
"""
|
40 |
args.task = "paper_decision"
|
41 |
|
42 |
+
print(const.AGENTREVIEW_LOGO)
|
43 |
+
|
44 |
# Sample Paper IDs from each category
|
45 |
paper_id2decision, paper_decision2ids = get_paper_decision_mapping(args.data_dir, args.conference)
|
46 |
|
|
|
100 |
|
101 |
players = initialize_players(experiment_setting=experiment_setting, args=args)
|
102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
player_names = [player.name for player in players]
|
104 |
|
105 |
if batch_index >= num_batches - 1: # Last batch. Include all remaining papers
|
|
|
116 |
arena = PaperReviewArena(players=players, environment=env, args=args, global_prompt=const.GLOBAL_PROMPT)
|
117 |
arena.launch_cli(interactive=False)
|
118 |
|
|
|
119 |
if __name__ == "__main__":
|
120 |
project_setup()
|
121 |
main(parse_args())
|
run_paper_review_cli.py
CHANGED
@@ -45,6 +45,8 @@ def main(args: Namespace):
|
|
45 |
|
46 |
args.task = "paper_review"
|
47 |
|
|
|
|
|
48 |
paper_id2decision, paper_decision2ids = get_paper_decision_mapping(args.data_dir, args.conference)
|
49 |
|
50 |
# Sample paper IDs for the simulation from existing data.
|
@@ -67,65 +69,6 @@ def main(args: Namespace):
|
|
67 |
|
68 |
player_names = [player.name for player in players]
|
69 |
|
70 |
-
# for role, players_list in experiment_setting["players"].items():
|
71 |
-
#
|
72 |
-
# for i, player_config in enumerate(players_list):
|
73 |
-
# if role == "Paper Extractor":
|
74 |
-
#
|
75 |
-
# player_config = get_paper_extractor_config(global_settings=experiment_setting['global_settings'], )
|
76 |
-
#
|
77 |
-
# player = PaperExtractorPlayer(data_dir=args.data_dir, paper_id=paper_id,
|
78 |
-
# paper_decision=paper_decision,
|
79 |
-
# args=args,
|
80 |
-
# conference=args.conference, **player_config)
|
81 |
-
#
|
82 |
-
# player_names.append(player.name)
|
83 |
-
#
|
84 |
-
#
|
85 |
-
# elif role == "AC":
|
86 |
-
#
|
87 |
-
# player_config = get_ac_config(env_type="paper_review",
|
88 |
-
# scoring_method=args.ac_scoring_method,
|
89 |
-
# num_papers_per_area_chair=args.num_papers_per_area_chair,
|
90 |
-
# global_settings=experiment_setting['global_settings'],
|
91 |
-
# acceptance_rate=args.acceptance_rate,
|
92 |
-
# **player_config)
|
93 |
-
#
|
94 |
-
# player_config['model'] = args.model_name
|
95 |
-
#
|
96 |
-
# player = AreaChair(data_dir=args.data_dir,
|
97 |
-
# conference=args.conference,
|
98 |
-
# args=args,
|
99 |
-
# **player_config)
|
100 |
-
#
|
101 |
-
# player_names.append(player.name)
|
102 |
-
#
|
103 |
-
#
|
104 |
-
# elif role == "Author":
|
105 |
-
#
|
106 |
-
# # Author requires no behavior customization.
|
107 |
-
# # So we directly use the Player class
|
108 |
-
# player_config = get_author_config()
|
109 |
-
# player = Player(data_dir=args.data_dir,
|
110 |
-
# conference=args.conference,
|
111 |
-
# args=args,
|
112 |
-
# **player_config)
|
113 |
-
#
|
114 |
-
# player_names.append(player.name)
|
115 |
-
#
|
116 |
-
# elif role == "Reviewer":
|
117 |
-
# player_config = get_reviewer_player_config(reviewer_index=i + 1,
|
118 |
-
# global_settings=experiment_setting['global_settings'],
|
119 |
-
# **player_config)
|
120 |
-
# player_config['model'] = args.model_name
|
121 |
-
# player = Reviewer(data_dir=args.data_dir, conference=args.conference, **player_config)
|
122 |
-
# player_names.append(player.name)
|
123 |
-
#
|
124 |
-
# else:
|
125 |
-
# raise NotImplementedError(f"Unknown role: {role}")
|
126 |
-
#
|
127 |
-
# players.append(player)
|
128 |
-
|
129 |
env = PaperReview(player_names=player_names, paper_decision=paper_decision, paper_id=paper_id,
|
130 |
args=args, experiment_setting=experiment_setting)
|
131 |
|
|
|
45 |
|
46 |
args.task = "paper_review"
|
47 |
|
48 |
+
print(const.AGENTREVIEW_LOGO)
|
49 |
+
|
50 |
paper_id2decision, paper_decision2ids = get_paper_decision_mapping(args.data_dir, args.conference)
|
51 |
|
52 |
# Sample paper IDs for the simulation from existing data.
|
|
|
69 |
|
70 |
player_names = [player.name for player in players]
|
71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
env = PaperReview(player_names=player_names, paper_decision=paper_decision, paper_id=paper_id,
|
73 |
args=args, experiment_setting=experiment_setting)
|
74 |
|