先坤
commited on
Commit
•
f407e0e
1
Parent(s):
9ee1a60
update
Browse files- README.md +135 -21
- images/GREEDRL-Logo-Original-320x320.png +0 -0
- images/GREEDRL-Logo-Original-640.png +0 -0
README.md
CHANGED
@@ -7,9 +7,11 @@ tags:
|
|
7 |
- Reinforcement Learning
|
8 |
- Vehicle Routing Problem
|
9 |
---
|
10 |
-
|
|
|
|
|
11 |
|
12 |
-
# GreedRL
|
13 |
|
14 |
# Introduction
|
15 |
|
@@ -17,32 +19,130 @@ tags:
|
|
17 |
## Architecture design
|
18 |
The entire architecture is divided into three layers:
|
19 |
|
20 |
-
* High-performance Env framework
|
21 |
|
22 |
-
The constraints and optimization objectives for the problems to be solved are defined in the RL Env.
|
23 |
-
Based on performance and ease of use considerations, the Env framework provides two implementations:
|
24 |
-
To facilitate the definition of problems for developers, the framework abstracts multiple variables to represent the environment's state, which are automatically generated after being declared by the user. When defining constraints and optimization objectives, developers can directly refer to the declared variables.
|
25 |
-
Currently, various VRP variants such as CVRP, VRPTW and PDPTW, as well as problems such as Batching and Online Assignment, are supported.
|
26 |
|
27 |
-
|
|
|
28 |
|
29 |
-
The framework provides certain neural network components, and developers can also implement custom neural network components.
|
30 |
|
31 |
-
*
|
32 |
|
33 |
-
|
34 |
|
35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
## Network design
|
38 |
-
The network
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
-
![network.png](./images/GREEDRL-Network.png)
|
41 |
|
42 |
## 🏆Award
|
43 |
|
44 |
|
45 |
-
# GreedRL-VRP-pretrained model
|
46 |
|
47 |
## Model description
|
48 |
|
@@ -53,9 +153,7 @@ The network structure adopts the seq2seq architecture commonly used in NLP, with
|
|
53 |
|
54 |
You can use these model for solving the vehicle routing problems (VRPs) with reinforcement learning (RL).
|
55 |
|
56 |
-
##
|
57 |
-
|
58 |
-
You can use this model directly with a pipeline for masked language modeling:
|
59 |
|
60 |
### Requirements
|
61 |
This library requires Python == 3.8. [Miniconda](https://docs.conda.io/en/latest/miniconda.html#system-requirements) / [Anaconda](https://docs.anaconda.com/anaconda/install/) is our recommended Python distribution.
|
@@ -64,25 +162,41 @@ This library requires Python == 3.8. [Miniconda](https://docs.conda.io/en/latest
|
|
64 |
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
|
65 |
```
|
66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
### Training
|
69 |
|
|
|
|
|
70 |
1. Training data
|
71 |
|
72 |
We use the generated data for the training phase, the customers and depot locations are randomly generated in the unit square [0,1] X [0,1].
|
73 |
|
74 |
-
For the
|
75 |
|
76 |
|
77 |
2. Start training
|
78 |
```python
|
79 |
-
|
|
|
|
|
80 |
```
|
81 |
|
82 |
### Evaluation
|
83 |
|
|
|
|
|
84 |
```python
|
85 |
-
|
|
|
|
|
86 |
```
|
87 |
|
88 |
## Support
|
|
|
7 |
- Reinforcement Learning
|
8 |
- Vehicle Routing Problem
|
9 |
---
|
10 |
+
<div align="center">
|
11 |
+
<img src="./images/GREEDRL-Logo-Original-640.png" width = "515" height = "380"/>
|
12 |
+
</div>
|
13 |
|
14 |
+
# ✊GreedRL
|
15 |
|
16 |
# Introduction
|
17 |
|
|
|
19 |
## Architecture design
|
20 |
The entire architecture is divided into three layers:
|
21 |
|
22 |
+
* **High-performance Env framework**
|
23 |
|
24 |
+
The constraints and optimization objectives for the problems to be solved are defined in the Reinforcement Learning(RL) Environment(Env).
|
25 |
+
Based on performance and ease of use considerations, the Env framework provides two implementations:one based on **pytorch** and one based on **CUDA C++**.
|
|
|
|
|
26 |
|
27 |
+
To facilitate the definition of problems for developers, the framework abstracts multiple variables to represent the environment's state, which are automatically generated after being declared by the user. When defining constraints and optimization objectives, developers can directly refer to the declared variables.
|
28 |
+
Currently, various VRP variants such as CVRP, VRPTW and PDPTW, as well as problems such as Batching, are supported.
|
29 |
|
|
|
30 |
|
31 |
+
* **Pluggable NN components**
|
32 |
|
33 |
+
The framework provides certain neural network(NN) components, and developers can also implement custom neural network components.
|
34 |
|
35 |
+
|
36 |
+
* **High-performance NN operators**
|
37 |
+
|
38 |
+
In order to achieve the ultimate performance, the framework implements some high-performance operators specifically for Combinatorial Optimization(CO) problems to replace pytorch operators, such as the Masked Addition Attention and Masked Softmax Sampling."
|
39 |
+
|
40 |
+
<div align="center">
|
41 |
+
<img src="./images/GREEDRL-Framwork.png" width = "515" height = "380"/>
|
42 |
+
</div>
|
43 |
|
44 |
## Network design
|
45 |
+
The neural network adopts the Seq2Seq architecture commonly used in Natural Language Processing(NLP), with the Transformer used in the Encoding part and RNN used in the decoding part, as shown in the diagram below.
|
46 |
+
|
47 |
+
<div align="center">
|
48 |
+
<img src="./images/GREEDRL-Network.png" width = "515" height = "380"/>
|
49 |
+
</div>
|
50 |
+
|
51 |
+
## Modeling examples
|
52 |
+
|
53 |
+
### VRP with Time Windows(VRPTW)
|
54 |
+
<details>
|
55 |
+
<summary>VRPTW</summary>
|
56 |
+
|
57 |
+
```python
|
58 |
+
from greedrl import Problem, Solution, Solver
|
59 |
+
from greedrl.feature import *
|
60 |
+
from greedrl.variable import *
|
61 |
+
from greedrl.function import *
|
62 |
+
from greedrl.model import runner
|
63 |
+
from greedrl.myenv import VrptwEnv
|
64 |
+
|
65 |
+
features = [continuous_feature('worker_weight_limit'),
|
66 |
+
continuous_feature('worker_ready_time'),
|
67 |
+
continuous_feature('worker_due_time'),
|
68 |
+
continuous_feature('worker_basic_cost'),
|
69 |
+
continuous_feature('worker_distance_cost'),
|
70 |
+
continuous_feature('task_demand'),
|
71 |
+
continuous_feature('task_weight'),
|
72 |
+
continuous_feature('task_ready_time'),
|
73 |
+
continuous_feature('task_due_time'),
|
74 |
+
continuous_feature('task_service_time'),
|
75 |
+
continuous_feature('distance_matrix')]
|
76 |
+
```
|
77 |
+
</details>
|
78 |
+
<details>
|
79 |
+
|
80 |
+
```python
|
81 |
+
variables = [task_demand_now('task_demand_now', feature='task_demand'),
|
82 |
+
task_demand_now('task_demand_this', feature='task_demand', only_this=True),
|
83 |
+
feature_variable('task_weight'),
|
84 |
+
feature_variable('task_due_time'),
|
85 |
+
feature_variable('task_ready_time'),
|
86 |
+
feature_variable('task_service_time'),
|
87 |
+
worker_variable('worker_weight_limit'),
|
88 |
+
worker_variable('worker_due_time'),
|
89 |
+
worker_variable('worker_basic_cost'),
|
90 |
+
worker_variable('worker_distance_cost'),
|
91 |
+
worker_used_resource('worker_used_weight', task_require='task_weight'),
|
92 |
+
worker_used_resource('worker_used_time', 'distance_matrix', 'task_service_time', 'task_ready_time',
|
93 |
+
'worker_ready_time'),
|
94 |
+
edge_variable('distance_last_to_this', feature='distance_matrix', last_to_this=True),
|
95 |
+
edge_variable('distance_this_to_task', feature='distance_matrix', this_to_task=True),
|
96 |
+
edge_variable('distance_task_to_end', feature='distance_matrix', task_to_end=True)]
|
97 |
+
|
98 |
+
|
99 |
+
class Constraint:
|
100 |
+
|
101 |
+
def do_task(self):
|
102 |
+
return self.task_demand_this
|
103 |
+
|
104 |
+
def mask_task(self):
|
105 |
+
# 已经完成的任务
|
106 |
+
mask = self.task_demand_now <= 0
|
107 |
+
# 车辆容量限制
|
108 |
+
worker_weight_limit = self.worker_weight_limit - self.worker_used_weight
|
109 |
+
mask |= self.task_demand_now * self.task_weight > worker_weight_limit[:, None]
|
110 |
+
|
111 |
+
worker_used_time = self.worker_used_time[:, None] + self.distance_this_to_task
|
112 |
+
mask |= worker_used_time > self.task_due_time
|
113 |
+
|
114 |
+
worker_used_time = torch.max(worker_used_time, self.task_ready_time)
|
115 |
+
worker_used_time += self.task_service_time
|
116 |
+
worker_used_time += self.distance_task_to_end
|
117 |
+
mask |= worker_used_time > self.worker_due_time[:, None]
|
118 |
+
|
119 |
+
return mask
|
120 |
+
|
121 |
+
def finished(self):
|
122 |
+
return torch.all(self.task_demand_now <= 0, 1)
|
123 |
+
|
124 |
+
|
125 |
+
class Objective:
|
126 |
+
|
127 |
+
def step_worker_start(self):
|
128 |
+
return self.worker_basic_cost
|
129 |
+
|
130 |
+
def step_worker_end(self):
|
131 |
+
return self.distance_last_to_this * self.worker_distance_cost
|
132 |
+
|
133 |
+
def step_task(self):
|
134 |
+
return self.distance_last_to_this * self.worker_distance_cost
|
135 |
+
```
|
136 |
+
|
137 |
+
</details>
|
138 |
+
|
139 |
+
### Pickup and Delivery Problem with Time Windows(PDPTW)
|
140 |
|
|
|
141 |
|
142 |
## 🏆Award
|
143 |
|
144 |
|
145 |
+
# 🤠GreedRL-VRP-pretrained model
|
146 |
|
147 |
## Model description
|
148 |
|
|
|
153 |
|
154 |
You can use these model for solving the vehicle routing problems (VRPs) with reinforcement learning (RL).
|
155 |
|
156 |
+
## How to use
|
|
|
|
|
157 |
|
158 |
### Requirements
|
159 |
This library requires Python == 3.8. [Miniconda](https://docs.conda.io/en/latest/miniconda.html#system-requirements) / [Anaconda](https://docs.anaconda.com/anaconda/install/) is our recommended Python distribution.
|
|
|
162 |
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
|
163 |
```
|
164 |
|
165 |
+
You need to compile first and add the resulting library `greedrl_c` to the `PYTHONPATH`
|
166 |
+
|
167 |
+
```aidl
|
168 |
+
python setup.py build
|
169 |
+
|
170 |
+
export PYTHONPATH={root_path}/greedrl/build/lib.linux-x86_64-cpython-38/
|
171 |
+
```
|
172 |
+
|
173 |
|
174 |
### Training
|
175 |
|
176 |
+
We provide examples of Capacitated VRP(CVRP) for training and inference.
|
177 |
+
|
178 |
1. Training data
|
179 |
|
180 |
We use the generated data for the training phase, the customers and depot locations are randomly generated in the unit square [0,1] X [0,1].
|
181 |
|
182 |
+
For the CVRP, we assume that the demand of each node is a discrete number in {1,...,9}, chosen uniformly at random.
|
183 |
|
184 |
|
185 |
2. Start training
|
186 |
```python
|
187 |
+
cd examples/cvrp
|
188 |
+
|
189 |
+
python train.py --model_filename cvrp_5000.pt --problem_size 5000
|
190 |
```
|
191 |
|
192 |
### Evaluation
|
193 |
|
194 |
+
We provide some pretrained models for different CVRP problem sizes, such as `cvrp_100`, `cvrp_1000`, `cvrp_2000` and `cvrp_5000`, that you can directly use for inference.
|
195 |
+
|
196 |
```python
|
197 |
+
cd examples/cvrp
|
198 |
+
|
199 |
+
python solve.py --device cuda --model_name cvrp_5000.pt --problem_size 5000
|
200 |
```
|
201 |
|
202 |
## Support
|
images/GREEDRL-Logo-Original-320x320.png
DELETED
Binary file (17.4 kB)
|
|
images/GREEDRL-Logo-Original-640.png
ADDED