zhengrongzhang commited on
Commit
32865f3
1 Parent(s): 962ea6f

init model

Browse files
Files changed (10) hide show
  1. README.md +148 -0
  2. coco.names +80 -0
  3. coco2017.data +4 -0
  4. general_json2yolo.py +179 -0
  5. onnx_inference.py +151 -0
  6. onnx_test.py +1247 -0
  7. requirements.txt +29 -0
  8. utils.py +213 -0
  9. yolov3-8.onnx +3 -0
  10. yolov3.cfg +788 -0
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - COCO
5
+ metrics:
6
+ - mAP
7
+ language:
8
+ - en
9
+ tags:
10
+ - RyzenAI
11
+ - object-detection
12
+ - vision
13
+ - YOLO
14
+ - Pytorch
15
+ ---
16
+
17
+ # YOLOv3 model trained on COCO
18
+
19
+ YOLOv3 is trained on COCO object detection (118k annotated images) at resolution 416x416. It was released in https://github.com/ultralytics/yolov3/tree/v8.
20
+
21
+ We develop a modified version that could be supported by [AMD Ryzen AI](https://ryzenai.docs.amd.com).
22
+
23
+
24
+ ## Model description
25
+
26
+ YOLOv3 🚀 is the world's most loved vision AI, representing Ultralytics open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development.
27
+
28
+
29
+ ## Intended uses & limitations
30
+
31
+ You can use the raw model for object detection. See the [model hub](https://huggingface.co/models?search=amd/yolov3) to look for all available YOLOv3 models.
32
+
33
+
34
+ ## How to use
35
+
36
+ ### Installation
37
+
38
+ Follow [Ryzen AI Installation](https://ryzenai.docs.amd.com/en/latest/inst.html) to prepare the environment for Ryzen AI.
39
+ Run the following script to install pre-requisites for this model.
40
+ ```bash
41
+ pip install -r requirements.txt
42
+ ```
43
+
44
+
45
+ ### Data Preparation (optional: for accuracy evaluation)
46
+
47
+ The dataset MSCOCO2017 contains 118287 images for training and 5000 images for validation.
48
+
49
+ 1. Download COCO dataset
50
+ 2. Run general_json2yolo.py to generate the labels folder and val2017.txt
51
+ ```sh
52
+ python general_json2yolo.py
53
+ ```
54
+ Finally, COCO dataset should look like this:
55
+ ```plain
56
+ + coco/
57
+ + annotations/
58
+ + instance_val2017.json
59
+ + ...
60
+ + images/
61
+ + val2017/
62
+ + 000000000139.jpg
63
+ + 000000000285.jpg
64
+ + ...
65
+ + labels/
66
+ + val2017/
67
+ + 000000000139.txt
68
+ + 000000000285.txt
69
+ + ...
70
+ + val2017.txt
71
+ ```
72
+
73
+
74
+
75
+ ### Test & Evaluation
76
+
77
+ - Code snippet from [`onnx_inference.py`](onnx_inference.py) on how to use
78
+ ```python
79
+ onnx_path = "yolov3-8.onnx"
80
+ onnx_model = onnxruntime.InferenceSession(
81
+ onnx_path, providers=providers, provider_options=provider_options)
82
+
83
+ path = opt.img
84
+ new_path = os.path.join(opt.out, "demo_infer.jpg")
85
+
86
+ conf_thres, iou_thres, classes, agnostic_nms, max_det = 0.25, \
87
+ 0.45, None, False, 1000
88
+
89
+ img0 = cv2.imread(path)
90
+ img = pre_process(img0)
91
+ onnx_input = {onnx_model.get_inputs()[0].name: img}
92
+ onnx_output = onnx_model.run(None, onnx_input)
93
+ onnx_output = post_process(onnx_output)
94
+
95
+ pred = non_max_suppression(
96
+ onnx_output[0],
97
+ conf_thres,
98
+ iou_thres,
99
+ multi_label=False,
100
+ classes=classes,
101
+ agnostic=agnostic_nms)
102
+
103
+ colors = [[random.randint(0, 255) for _ in range(3)]
104
+ for _ in range(len(names))]
105
+ det = pred[0]
106
+ im0 = img0.copy()
107
+
108
+ if len(det):
109
+ # Rescale boxes from imgsz to im0 size
110
+ det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
111
+
112
+ # Write results
113
+ for *xyxy, conf, cls in reversed(det):
114
+ label = '%s %.2f' % (names[int(cls)], conf)
115
+ plot_one_box(xyxy, im0, label=label, color=colors[int(cls)])
116
+
117
+ # Stream results
118
+ cv2.imwrite(new_path, im0)
119
+ ```
120
+
121
+ - Run inference for a single image
122
+ ```sh
123
+ python onnx_inference.py --img INPUT_IMG_PATH --out OUTPUT_DIR --ipu --provider_config Path\To\vaip_config.json
124
+ ```
125
+ *Note: __vaip_config.json__ is located at the setup package of Ryzen AI (refer to [Installation](#installation))*
126
+
127
+ - Test accuracy of the quantized model
128
+ ```sh
129
+ python onnx_test.py --ipu --provider_config Path\To\vaip_config.json
130
+ ```
131
+
132
+ ### Performance
133
+
134
+ |Metric |Accuracy on IPU|
135
+ | :----: | :----: |
136
+ |AP\@0.50:0.95|0.389|
137
+
138
+
139
+ ```bibtex
140
+ @misc{redmon2018yolov3,
141
+ title={YOLOv3: An Incremental Improvement},
142
+ author={Joseph Redmon and Ali Farhadi},
143
+ year={2018},
144
+ eprint={1804.02767},
145
+ archivePrefix={arXiv},
146
+ primaryClass={cs.CV}
147
+ }
148
+ ```
coco.names ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ person
2
+ bicycle
3
+ car
4
+ motorcycle
5
+ airplane
6
+ bus
7
+ train
8
+ truck
9
+ boat
10
+ traffic light
11
+ fire hydrant
12
+ stop sign
13
+ parking meter
14
+ bench
15
+ bird
16
+ cat
17
+ dog
18
+ horse
19
+ sheep
20
+ cow
21
+ elephant
22
+ bear
23
+ zebra
24
+ giraffe
25
+ backpack
26
+ umbrella
27
+ handbag
28
+ tie
29
+ suitcase
30
+ frisbee
31
+ skis
32
+ snowboard
33
+ sports ball
34
+ kite
35
+ baseball bat
36
+ baseball glove
37
+ skateboard
38
+ surfboard
39
+ tennis racket
40
+ bottle
41
+ wine glass
42
+ cup
43
+ fork
44
+ knife
45
+ spoon
46
+ bowl
47
+ banana
48
+ apple
49
+ sandwich
50
+ orange
51
+ broccoli
52
+ carrot
53
+ hot dog
54
+ pizza
55
+ donut
56
+ cake
57
+ chair
58
+ couch
59
+ potted plant
60
+ bed
61
+ dining table
62
+ toilet
63
+ tv
64
+ laptop
65
+ mouse
66
+ remote
67
+ keyboard
68
+ cell phone
69
+ microwave
70
+ oven
71
+ toaster
72
+ sink
73
+ refrigerator
74
+ book
75
+ clock
76
+ vase
77
+ scissors
78
+ teddy bear
79
+ hair drier
80
+ toothbrush
coco2017.data ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ classes=80
2
+ train=coco/train2017.txt
3
+ valid=coco/val2017.txt
4
+ names=coco.names
general_json2yolo.py ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ from tqdm import tqdm
3
+ from pathlib import Path
4
+ import json
5
+ from collections import defaultdict
6
+ import sys
7
+ import pathlib
8
+
9
+ CURRENT_DIR = pathlib.Path(__file__).parent
10
+ sys.path.append(str(CURRENT_DIR))
11
+
12
+
13
+ def make_dirs(path='coco'):
14
+ # Create folders
15
+ path = Path(path)
16
+ for p in [path / 'labels']:
17
+ p.mkdir(parents=True, exist_ok=True) # make dir
18
+ return path
19
+
20
+
21
+ def coco91_to_coco80_class(): # converts 80-index (val2014) to 91-index (paper)
22
+ # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
23
+ x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, None, 11, 12, 13, 14, 15, 16, 17,
24
+ 18, 19, 20, 21, 22, 23, None, 24, 25, None, None, 26, 27, 28, 29,
25
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, None, 40, 41, 42, 43, 44,
26
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, None,
27
+ 60, None, None, 61, None, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
28
+ 72, None, 73, 74, 75, 76, 77, 78, 79, None]
29
+ return x
30
+
31
+
32
+ def convert_coco_json(
33
+ json_dir='coco/annotations/',
34
+ use_segments=False,
35
+ cls91to80=False):
36
+ save_dir = make_dirs() # output directory
37
+ coco80 = coco91_to_coco80_class()
38
+ """Convert raw COCO dataset to YOLO style
39
+ """
40
+
41
+ # Import json
42
+ for json_file in sorted(Path(json_dir).resolve().glob('instances_val2017.json')):
43
+ fn = Path(save_dir) / 'labels' / \
44
+ json_file.stem.replace('instances_', '') # folder name
45
+ fn.mkdir()
46
+ with open(json_file) as f:
47
+ data = json.load(f)
48
+
49
+ # Create image dict
50
+ images = {'%g' % x['id']: x for x in data['images']}
51
+ # Create image-annotations dict
52
+ imgToAnns = defaultdict(list)
53
+ for ann in data['annotations']:
54
+ imgToAnns[ann['image_id']].append(ann)
55
+
56
+ txt_file = open(Path(save_dir / 'val2017').
57
+ with_suffix('.txt'), 'a')
58
+ # Write labels file
59
+ for img_id, anns in tqdm(
60
+ imgToAnns.items(), desc=f'Annotations {json_file}'):
61
+ img = images['%g' % img_id]
62
+ h, w, f = img['height'], img['width'], img['file_name']
63
+ bboxes = []
64
+ segments = []
65
+
66
+ txt_file.write(
67
+ './images/' + '/'.
68
+ join(img['coco_url'].split('/')[-2:]) + '\n')
69
+ for ann in anns:
70
+ if ann['iscrowd']:
71
+ continue
72
+ # The COCO box format is
73
+ # [top left x, top left y, width,
74
+ # height]
75
+ box = np.array(ann['bbox'], dtype=np.float64)
76
+ box[:2] += box[2:] / 2 # xy top-left corner to center
77
+ box[[0, 2]] /= w # normalize x
78
+ box[[1, 3]] /= h # normalize y
79
+ if box[2] <= 0 or box[3] <= 0: # if w <= 0 and h <= 0
80
+ continue
81
+ cls = coco80[ann['category_id'] - 1] \
82
+ if cls91to80 else ann['category_id'] - 1 # class
83
+ box = [cls] + box.tolist()
84
+ if box not in bboxes:
85
+ bboxes.append(box)
86
+ # Segments
87
+ if use_segments:
88
+ if len(ann['segmentation']) > 1:
89
+ s = merge_multi_segment(ann['segmentation'])
90
+ s = (np.concatenate(s, axis=0) /
91
+ np.array([w, h])).reshape(-1).tolist()
92
+ else:
93
+ s = [j for i in ann['segmentation']
94
+ for j in i] # all segments concatenated
95
+ s = (np.array(s).reshape(-1, 2) /
96
+ np.array([w, h])).reshape(-1).tolist()
97
+ s = [cls] + s
98
+ if s not in segments:
99
+ segments.append(s)
100
+
101
+ # Write
102
+ with open((fn / f).with_suffix('.txt'), 'a') as file:
103
+ for i in range(len(bboxes)):
104
+ # cls, box or segments
105
+ line = *(segments[i] if
106
+ use_segments else bboxes[i]),
107
+ file.write(('%g ' * len(line)).
108
+ rstrip() % line + '\n')
109
+ txt_file.close()
110
+
111
+
112
+ def min_index(arr1, arr2):
113
+ """Find a pair of indexes with the shortest distance.
114
+ Args:
115
+ arr1: (N, 2).
116
+ arr2: (M, 2).
117
+ Return:
118
+ a pair of indexes(tuple).
119
+ """
120
+ dis = ((arr1[:, None, :] - arr2[None, :, :]) ** 2).sum(-1)
121
+ return np.unravel_index(np.argmin(dis, axis=None), dis.shape)
122
+
123
+
124
+ def merge_multi_segment(segments):
125
+ """Merge multi segments to one list.
126
+ Find the coordinates with min distance between each segment,
127
+ then connect these coordinates with one thin line to merge all
128
+ segments into one.
129
+
130
+ Args:
131
+ segments(List(List)): original
132
+ segmentations in coco's json file.
133
+ like [segmentation1, segmentation2,...],
134
+ each segmentation is a list of coordinates.
135
+ """
136
+ s = []
137
+ segments = [np.array(i).reshape(-1, 2) for i in segments]
138
+ idx_list = [[] for _ in range(len(segments))]
139
+
140
+ # record the indexes with min distance between each segment
141
+ for i in range(1, len(segments)):
142
+ idx1, idx2 = min_index(segments[i - 1], segments[i])
143
+ idx_list[i - 1].append(idx1)
144
+ idx_list[i].append(idx2)
145
+
146
+ # use two round to connect all the segments
147
+ for k in range(2):
148
+ # forward connection
149
+ if k == 0:
150
+ for i, idx in enumerate(idx_list):
151
+ # middle segments have two indexes
152
+ # reverse the index of middle segments
153
+ if len(idx) == 2 and idx[0] > idx[1]:
154
+ idx = idx[::-1]
155
+ segments[i] = segments[i][::-1, :]
156
+
157
+ segments[i] = np.roll(segments[i], -idx[0], axis=0)
158
+ segments[i] = np.concatenate([segments[i],
159
+ segments[i][:1]])
160
+ # deal with the first segment and the last one
161
+ if i in [0, len(idx_list) - 1]:
162
+ s.append(segments[i])
163
+ else:
164
+ idx = [0, idx[1] - idx[0]]
165
+ s.append(segments[i][idx[0]:idx[1] + 1])
166
+
167
+ else:
168
+ for i in range(len(idx_list) - 1, -1, -1):
169
+ if i not in [0, len(idx_list) - 1]:
170
+ idx = idx_list[i]
171
+ nidx = abs(idx[1] - idx[0])
172
+ s.append(segments[i][nidx:])
173
+ return s
174
+
175
+
176
+ if __name__ == '__main__':
177
+ convert_coco_json('coco/annotations',
178
+ use_segments=False,
179
+ cls91to80=True)
onnx_inference.py ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import onnxruntime
2
+ import argparse
3
+ import os
4
+ from utils import *
5
+
6
+
7
+ def pre_process(img):
8
+ """
9
+ Preprocessing part of YOLOv3 for scaling and padding image as input to the network.
10
+ Args:
11
+ img (numpy.ndarray): H x W x C, image read with OpenCV
12
+ Returns:
13
+ padded_img (numpy.ndarray): preprocessed image to be fed to the network
14
+ """
15
+ img = letterbox(img, auto=False)[0]
16
+ # Convert
17
+ img = img.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
18
+ img = np.ascontiguousarray(img)
19
+ img = img.astype("float32")
20
+ img = img / 255.0
21
+ img = img[np.newaxis, :]
22
+ return img
23
+
24
+
25
+ def post_process(x, conf_thres=0.1, iou_thres=0.6, multi_label=True,
26
+ classes=None, agnostic=False):
27
+ """
28
+ Post-processing part of YOLOv3 for generating final results from outputs of the network.
29
+ Returns:
30
+ pred (torch.tensor): n x 6, dets[:,:4] -> boxes, dets[:,4] -> scores, dets[:,5] -> class indices
31
+ """
32
+ stride = [32, 16, 8]
33
+ anchors = [[10, 13, 16, 30, 33, 23],
34
+ [30, 61, 62, 45, 59, 119],
35
+ [116, 90, 156, 198, 373, 326]]
36
+ temp = [13, 26, 52]
37
+ res = []
38
+
39
+ def create_grids(ng=(13, 13)):
40
+ nx, ny = ng # x and y grid size
41
+ ng = torch.tensor(ng, dtype=torch.float)
42
+
43
+ # build xy offsets
44
+ yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
45
+ grid = torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
46
+
47
+ return grid
48
+
49
+ for i in range(3):
50
+ out = torch.from_numpy(x[i])
51
+
52
+ bs, _, ny, nx = out.shape # bs, 255, 13, 13
53
+
54
+ anchor = torch.Tensor(anchors[2 - i]).reshape(3, 2)
55
+ anchor_vec = anchor / stride[i]
56
+ anchor_wh = anchor_vec.view(1, 3, 1, 1, 2)
57
+
58
+ grid = create_grids((nx, ny))
59
+
60
+ out = out.view(
61
+ bs, 3, 85, temp[i], temp[i]).permute(
62
+ 0, 1, 3, 4, 2).contiguous() # prediction
63
+
64
+ io = out.clone()
65
+
66
+ io[..., :2] = torch.sigmoid(io[..., :2]) + grid
67
+ io[..., 2:4] = torch.exp(io[..., 2:4]) * anchor_wh
68
+ io[..., :4] *= stride[i]
69
+ torch.sigmoid_(io[..., 4:])
70
+
71
+ res.append(io.view(bs, -1, 85))
72
+
73
+ pred = non_max_suppression(torch.cat(res, 1), conf_thres,
74
+ iou_thres, multi_label=multi_label,
75
+ classes=classes, agnostic=agnostic)
76
+
77
+ return pred
78
+
79
+
80
+ if __name__ == '__main__':
81
+ parser = argparse.ArgumentParser(
82
+ prog='One image inference of onnx model')
83
+ parser.add_argument(
84
+ '--img',
85
+ type=str,
86
+ help='Path of input image')
87
+ parser.add_argument(
88
+ '--out',
89
+ type=str,
90
+ default='.',
91
+ help='Path of out put image')
92
+ parser.add_argument(
93
+ "--ipu",
94
+ action="store_true",
95
+ help="Use IPU for inference.")
96
+ parser.add_argument(
97
+ "--provider_config",
98
+ type=str,
99
+ default="vaip_config.json",
100
+ help="Path of the config file for seting provider_options.")
101
+ parser.add_argument(
102
+ "--onnx_path",
103
+ type=str,
104
+ default="yolov3-8.onnx",
105
+ help="Path of the onnx model.")
106
+
107
+ opt = parser.parse_args()
108
+ with open('coco.names', 'r') as f:
109
+ names = f.read()
110
+
111
+ if opt.ipu:
112
+ providers = ["VitisAIExecutionProvider"]
113
+ provider_options = [{"config_file": opt.provider_config}]
114
+ else:
115
+ providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
116
+ provider_options = None
117
+
118
+ onnx_path = opt.onnx_path
119
+ onnx_model = onnxruntime.InferenceSession(
120
+ onnx_path, providers=providers, provider_options=provider_options)
121
+
122
+ path = opt.img
123
+ new_path = os.path.join(opt.out, "demo_infer.jpg")
124
+
125
+ conf_thres, iou_thres, classes, agnostic_nms, max_det = 0.25, \
126
+ 0.45, None, False, 1000
127
+
128
+ img0 = cv2.imread(path)
129
+ img = pre_process(img0)
130
+ onnx_input = {onnx_model.get_inputs()[0].name: img}
131
+ onnx_output = onnx_model.run(None, onnx_input)
132
+ pred = post_process(onnx_output, conf_thres,
133
+ iou_thres, multi_label=False,
134
+ classes=classes, agnostic=agnostic_nms)
135
+
136
+ colors = [[random.randint(0, 255) for _ in range(3)]
137
+ for _ in range(len(names))]
138
+ det = pred[0]
139
+ im0 = img0.copy()
140
+
141
+ if len(det):
142
+ # Rescale boxes from imgsz to im0 size
143
+ det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
144
+
145
+ # Write results
146
+ for *xyxy, conf, cls in reversed(det):
147
+ label = '%s %.2f' % (names[int(cls)], conf)
148
+ plot_one_box(xyxy, im0, label=label, color=colors[int(cls)])
149
+
150
+ # Stream results
151
+ cv2.imwrite(new_path, im0)
onnx_test.py ADDED
@@ -0,0 +1,1247 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import json
3
+ import os
4
+ from pathlib import Path
5
+ from tqdm import tqdm
6
+ import glob
7
+ import math
8
+ from PIL import ExifTags, Image
9
+ import shutil
10
+ from torch.utils.data import DataLoader
11
+ from torch.utils.data import Dataset
12
+ from utils import *
13
+ import onnxruntime
14
+ import matplotlib.pyplot as plt
15
+
16
+ for orientation in ExifTags.TAGS.keys():
17
+ if ExifTags.TAGS[orientation] == 'Orientation':
18
+ break
19
+
20
+
21
+ def create_folder(path='./new_folder'):
22
+ # Create folder
23
+ if os.path.exists(path):
24
+ shutil.rmtree(path) # delete output folder
25
+ os.makedirs(path) # make new output folder
26
+
27
+
28
+ def exif_size(img):
29
+ # Returns exif-corrected PIL size
30
+ s = img.size # (width, height)
31
+ try:
32
+ rotation = dict(img._getexif().items())[orientation]
33
+ if rotation == 6: # rotation 270
34
+ s = (s[1], s[0])
35
+ elif rotation == 8: # rotation 90
36
+ s = (s[1], s[0])
37
+ except BaseException:
38
+ pass
39
+
40
+ return s
41
+
42
+
43
+ def ap_per_class(tp, conf, pred_cls, target_cls):
44
+ """ Compute the average precision, given the recall and precision curves.
45
+ Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
46
+ # Arguments
47
+ tp: True positives (nparray, nx1 or nx10).
48
+ conf: Objectness value from 0-1 (nparray).
49
+ pred_cls: Predicted object classes (nparray).
50
+ target_cls: True object classes (nparray).
51
+ # Returns
52
+ The average precision as computed in py-faster-rcnn.
53
+ """
54
+
55
+ # Sort by objectness
56
+ i = np.argsort(-conf)
57
+ tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
58
+
59
+ # Find unique classes
60
+ unique_classes = np.unique(target_cls)
61
+
62
+ # Create Precision-Recall curve and compute AP for each class
63
+ pr_score = 0.1
64
+ # score to evaluate P and R
65
+ # https://github.com/ultralytics/yolov3/issues/898
66
+ # number class, number iou thresholds (i.e. 10 for mAP0.5...0.95)
67
+ s = [unique_classes.shape[0], tp.shape[1]]
68
+ ap, p, r = np.zeros(s), np.zeros(s), np.zeros(s)
69
+ for ci, c in enumerate(unique_classes):
70
+ i = pred_cls == c
71
+ n_gt = (target_cls == c).sum() # Number of ground truth objects
72
+ n_p = i.sum() # Number of predicted objects
73
+
74
+ if n_p == 0 or n_gt == 0:
75
+ continue
76
+ else:
77
+ # Accumulate FPs and TPs
78
+ fpc = (1 - tp[i]).cumsum(0)
79
+ tpc = tp[i].cumsum(0)
80
+
81
+ # Recall
82
+ recall = tpc / (n_gt + 1e-16) # recall curve
83
+ # r at pr_score, negative x, xp because xp decreases
84
+ r[ci] = np.interp(-pr_score, -conf[i], recall[:, 0])
85
+
86
+ # Precision
87
+ precision = tpc / (tpc + fpc) # precision curve
88
+ p[ci] = np.interp(-pr_score, -conf[i],
89
+ precision[:, 0]) # p at pr_score
90
+
91
+ # AP from recall-precision curve
92
+ for j in range(tp.shape[1]):
93
+ ap[ci, j] = compute_ap(recall[:, j], precision[:, j])
94
+
95
+ # Plot
96
+ # fig, ax = plt.subplots(1, 1, figsize=(5, 5))
97
+ # ax.plot(recall, precision)
98
+ # ax.set_xlabel('Recall')
99
+ # ax.set_ylabel('Precision')
100
+ # ax.set_xlim(0, 1.01)
101
+ # ax.set_ylim(0, 1.01)
102
+ # fig.tight_layout()
103
+ # fig.savefig('PR_curve.png', dpi=300)
104
+
105
+ # Compute F1 score (harmonic mean of precision and recall)
106
+ f1 = 2 * p * r / (p + r + 1e-16)
107
+
108
+ return p, r, ap, f1, unique_classes.astype('int32')
109
+
110
+
111
+ def time_synchronized():
112
+ torch.cuda.synchronize() if torch.cuda.is_available() else None
113
+ return time.time()
114
+
115
+
116
+ def plot_images(
117
+ images,
118
+ targets,
119
+ paths=None,
120
+ fname='images.jpg',
121
+ names=None,
122
+ max_size=640,
123
+ max_subplots=16):
124
+ tl = 3 # line thickness
125
+ tf = max(tl - 1, 1) # font thickness
126
+ if os.path.isfile(fname): # do not overwrite
127
+ return None
128
+
129
+ if isinstance(images, torch.Tensor):
130
+ images = images.cpu().numpy()
131
+
132
+ if isinstance(targets, torch.Tensor):
133
+ targets = targets.cpu().numpy()
134
+
135
+ # un-normalise
136
+ if np.max(images[0]) <= 1:
137
+ images *= 255
138
+
139
+ bs, _, h, w = images.shape # batch size, _, height, width
140
+ bs = min(bs, max_subplots) # limit plot images
141
+ ns = np.ceil(bs ** 0.5) # number of subplots (square)
142
+
143
+ # Check if we should resize
144
+ scale_factor = max_size / max(h, w)
145
+ if scale_factor < 1:
146
+ h = math.ceil(scale_factor * h)
147
+ w = math.ceil(scale_factor * w)
148
+
149
+ # Empty array for output
150
+ mosaic = np.full((int(ns * h), int(ns * w), 3), 255, dtype=np.uint8)
151
+
152
+ # Fix class - colour map
153
+ prop_cycle = plt.rcParams['axes.prop_cycle']
154
+
155
+ # https://stackoverflow.com/questions/51350872/python-from-color-name-to-rgb
156
+ def hex2rgb(h):
157
+ return tuple(
158
+ int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4))
159
+
160
+ color_lut = [hex2rgb(h) for h in prop_cycle.by_key()['color']]
161
+
162
+ for i, img in enumerate(images):
163
+ if i == max_subplots: # if last batch has fewer images than we expect
164
+ break
165
+
166
+ block_x = int(w * (i // ns))
167
+ block_y = int(h * (i % ns))
168
+
169
+ img = img.transpose(1, 2, 0)
170
+ if scale_factor < 1:
171
+ img = cv2.resize(img, (w, h))
172
+
173
+ mosaic[block_y:block_y + h, block_x:block_x + w, :] = img
174
+ if len(targets) > 0:
175
+ image_targets = targets[targets[:, 0] == i]
176
+ boxes = xywh2xyxy(image_targets[:, 2:6]).T
177
+ classes = image_targets[:, 1].astype('int')
178
+ gt = image_targets.shape[1] == 6
179
+ # ground truth if no conf column
180
+ # check for confidence presence (gt vs pred)
181
+ conf = None if gt else image_targets[:, 6]
182
+
183
+ boxes[[0, 2]] *= w
184
+ boxes[[0, 2]] += block_x
185
+ boxes[[1, 3]] *= h
186
+ boxes[[1, 3]] += block_y
187
+ for j, box in enumerate(boxes.T):
188
+ cls = int(classes[j])
189
+ color = color_lut[cls % len(color_lut)]
190
+ cls = names[cls] if names else cls
191
+ if gt or conf[j] > 0.3: # 0.3 conf thresh
192
+ label = '%s' % cls if gt else '%s %.1f' % (cls, conf[j])
193
+ plot_one_box(box, mosaic, label=label,
194
+ color=color, line_thickness=tl)
195
+
196
+ # Draw image filename labels
197
+ if paths is not None:
198
+ label = os.path.basename(paths[i])[:40] # trim to 40 char
199
+ t_size = cv2.getTextSize(
200
+ label, 0, fontScale=tl / 3, thickness=tf)[0]
201
+ cv2.putText(mosaic, label, (block_x +
202
+ 5, block_y +
203
+ t_size[1] +
204
+ 5), 0, tl /
205
+ 3, [220, 220, 220], thickness=tf, lineType=cv2.LINE_AA)
206
+
207
+ # Image border
208
+ cv2.rectangle(mosaic, (block_x, block_y), (block_x + w,
209
+ block_y + h), (255, 255, 255), thickness=3)
210
+
211
+ if fname is not None:
212
+ mosaic = cv2.resize(mosaic,
213
+ (int(ns * w * 0.5),
214
+ int(ns * h * 0.5)),
215
+ interpolation=cv2.INTER_AREA)
216
+ cv2.imwrite(fname, cv2.cvtColor(mosaic, cv2.COLOR_BGR2RGB))
217
+
218
+ return mosaic
219
+
220
+
221
+ def random_affine(img, targets=(), degrees=10, translate=.1,
222
+ scale=.1, shear=10, border=0):
223
+ # targets = [cls, xyxy]
224
+
225
+ height = img.shape[0] + border * 2
226
+ width = img.shape[1] + border * 2
227
+
228
+ # Rotation and Scale
229
+ R = np.eye(3)
230
+ a = random.uniform(-degrees, degrees)
231
+ # a += random.choice([-180, -90, 0, 90])
232
+ # add 90deg rotations to small rotations
233
+ s = random.uniform(1 - scale, 1 + scale)
234
+ # s = 2 ** random.uniform(-scale, scale)
235
+ R[:2] = cv2.getRotationMatrix2D(angle=a,
236
+ center=(img.shape[1] / 2,
237
+ img.shape[0] / 2),
238
+ scale=s)
239
+
240
+ # Translation
241
+ T = np.eye(3)
242
+ T[0, 2] = (random.uniform(-translate, translate) *
243
+ img.shape[0] + border) # x translation (pixels)
244
+ T[1, 2] = (random.uniform(-translate, translate) *
245
+ img.shape[1] + border) # y translation (pixels)
246
+
247
+ # Shear
248
+ S = np.eye(3)
249
+ S[0, 1] = math.tan(random.uniform(-shear, shear) *
250
+ math.pi / 180) # x shear (deg)
251
+ S[1, 0] = math.tan(random.uniform(-shear, shear) *
252
+ math.pi / 180) # y shear (deg)
253
+
254
+ # Combined rotation matrix
255
+ M = S @ T @ R # ORDER IS IMPORTANT HERE!!
256
+ if (border != 0) or (M != np.eye(3)).any(): # image changed
257
+ img = cv2.warpAffine(img, M[:2], dsize=(width, height),
258
+ flags=cv2.INTER_LINEAR,
259
+ borderValue=(114, 114, 114))
260
+
261
+ # Transform label coordinates
262
+ n = len(targets)
263
+ if n:
264
+ # warp points
265
+ xy = np.ones((n * 4, 3))
266
+ xy[:, :2] = (targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].
267
+ reshape(n * 4, 2))
268
+ # x1y1, x2y2, x1y2, x2y1
269
+ xy = (xy @ M.T)[:, :2].reshape(n, 8)
270
+
271
+ # create new boxes
272
+ x = xy[:, [0, 2, 4, 6]]
273
+ y = xy[:, [1, 3, 5, 7]]
274
+ xy = np.concatenate((x.min(1), y.min(1), x.max(1),
275
+ y.max(1))).reshape(4, n).T
276
+
277
+ # reject warped points outside of image
278
+ xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)
279
+ xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)
280
+ w = xy[:, 2] - xy[:, 0]
281
+ h = xy[:, 3] - xy[:, 1]
282
+ area = w * h
283
+ area0 = ((targets[:, 3] - targets[:, 1]) *
284
+ (targets[:, 4] - targets[:, 2]))
285
+ ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16))
286
+ # aspect ratio
287
+ i = (w > 4) & (h > 4) & (area / (area0 * s + 1e-16)
288
+ > 0.2) & (ar < 10)
289
+
290
+ targets = targets[i]
291
+ targets[:, 1:5] = xy[i]
292
+
293
+ return img, targets
294
+
295
+
296
+ def output_to_target(output, width, height):
297
+ """
298
+ Convert a YOLO model output to target format
299
+ [batch_id, class_id, x, y, w, h, conf]
300
+ """
301
+ if isinstance(output, torch.Tensor):
302
+ output = output.cpu().numpy()
303
+
304
+ targets = []
305
+ for i, o in enumerate(output):
306
+ if o is not None:
307
+ for pred in o:
308
+ box = pred[:4]
309
+ w = (box[2] - box[0]) / width
310
+ h = (box[3] - box[1]) / height
311
+ x = box[0] / width + w / 2
312
+ y = box[1] / height + h / 2
313
+ conf = pred[4]
314
+ cls = int(pred[5])
315
+
316
+ targets.append([i, cls, x, y, w, h, conf])
317
+
318
+ return np.array(targets)
319
+
320
+
321
+ def xyxy2xywh(x):
322
+ # Convert nx4 boxes from [x1, y1, x2, y2] to [x, y, w, h] where
323
+ # xy1=top-left, xy2=bottom-right
324
+ y = torch.zeros_like(x) if isinstance(
325
+ x, torch.Tensor) else np.zeros_like(x)
326
+ y[:, 0] = (x[:, 0] + x[:, 2]) / 2 # x center
327
+ y[:, 1] = (x[:, 1] + x[:, 3]) / 2 # y center
328
+ y[:, 2] = x[:, 2] - x[:, 0] # width
329
+ y[:, 3] = x[:, 3] - x[:, 1] # height
330
+ return y
331
+
332
+
333
+ def coco80_to_coco91_class(): # converts 80-index (val2014) to 91-index (paper)
334
+ # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
335
+ x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20,
336
+ 21, 22, 23, 24, 25, 27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
337
+ 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
338
+ 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79,
339
+ 80, 81, 82, 84, 85, 86, 87, 88, 89, 90]
340
+ return x
341
+
342
+
343
+ def check_file(file):
344
+ # Searches for file if not found locally
345
+ if os.path.isfile(file):
346
+ return file
347
+ else:
348
+ files = glob.glob('./**/' + file, recursive=True) # find file
349
+ assert len(files), 'File Not Found: %s' % file # assert file was found
350
+ return files[0] # return first file if multiple found
351
+
352
+
353
+ def load_classes(path):
354
+ # Loads *.names file at 'path'
355
+ with open(path, 'r') as f:
356
+ names = f.read().split('\n')
357
+ # filter removes empty strings (such as last line)
358
+ return list(filter(None, names))
359
+
360
+
361
+ def load_image(self, index):
362
+ # loads 1 image from dataset, returns img, original hw, resized hw
363
+ img = self.imgs[index]
364
+ if img is None: # not cached
365
+ path = self.img_files[index]
366
+ img = cv2.imread(path) # BGR
367
+ assert img is not None, 'Image Not Found ' + path
368
+ h0, w0 = img.shape[:2] # orig hw
369
+ r = self.img_size / max(h0, w0) # resize image to img_size
370
+ if r != 1:
371
+ # always resize down, only resize up if training with augmentation
372
+ interp = cv2.INTER_AREA if r < 1 and not self.augment \
373
+ else cv2.INTER_LINEAR
374
+ img = cv2.resize(img, (int(w0 * r), int(h0 * r)),
375
+ interpolation=interp)
376
+ return img, (h0, w0), img.shape[:2] # img, hw_original, hw_resized
377
+ else:
378
+ # img, hw_original, hw_resized
379
+ return self.imgs[index], self.img_hw0[index], self.img_hw[index]
380
+
381
+
382
+ def load_mosaic(self, index):
383
+ # loads images in a mosaic
384
+
385
+ labels4 = []
386
+ s = self.img_size
387
+ xc, yc = [int(random.uniform(s * 0.5, s * 1.5))
388
+ for _ in range(2)] # mosaic center x, y
389
+ indices = [index] + [random.randint(0, len(self.labels) - 1)
390
+ for _ in range(3)] # 3 additional image indices
391
+ for i, index in enumerate(indices):
392
+ # Load image
393
+ img, _, (h, w) = load_image(self, index)
394
+
395
+ # place img in img4
396
+ if i == 0: # top left
397
+ img4 = np.full((s * 2, s * 2, img.shape[2]),
398
+ 114, dtype=np.uint8)
399
+ # base image with 4 tiles
400
+ x1a, y1a, x2a, y2a = (max(xc - w, 0),
401
+ max(yc - h, 0), xc, yc)
402
+ # xmin, ymin, xmax, ymax (large image)
403
+ x1b, y1b, x2b, y2b = (w - (x2a - x1a), h -
404
+ (y2a - y1a), w, h)
405
+ # xmin, ymin, xmax, ymax (small image)
406
+ elif i == 1: # top right
407
+ x1a, y1a, x2a, y2a = (xc, max(yc - h, 0),
408
+ min(xc + w, s * 2), yc)
409
+ x1b, y1b, x2b, y2b = (0, h - (y2a - y1a),
410
+ min(w, x2a - x1a), h)
411
+ elif i == 2: # bottom left
412
+ x1a, y1a, x2a, y2a = (max(xc - w, 0), yc,
413
+ xc, min(s * 2, yc + h))
414
+ x1b, y1b, x2b, y2b = (w - (x2a - x1a), 0,
415
+ max(xc, w), min(y2a - y1a, h))
416
+ elif i == 3: # bottom right
417
+ x1a, y1a, x2a, y2a = xc, yc, min(xc + w,
418
+ s * 2), min(s * 2, yc + h)
419
+ x1b, y1b, x2b, y2b = (0, 0,
420
+ min(w, x2a - x1a), min(y2a - y1a, h))
421
+
422
+ img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]
423
+ # img4[ymin:ymax, xmin:xmax]
424
+ padw = x1a - x1b
425
+ padh = y1a - y1b
426
+
427
+ # Labels
428
+ x = self.labels[index]
429
+ labels = x.copy()
430
+ if x.size > 0: # Normalized xywh to pixel xyxy format
431
+ labels[:, 1] = w * (x[:, 1] - x[:, 3] / 2) + padw
432
+ labels[:, 2] = h * (x[:, 2] - x[:, 4] / 2) + padh
433
+ labels[:, 3] = w * (x[:, 1] + x[:, 3] / 2) + padw
434
+ labels[:, 4] = h * (x[:, 2] + x[:, 4] / 2) + padh
435
+ labels4.append(labels)
436
+
437
+ # Concat/clip labels
438
+ if len(labels4):
439
+ labels4 = np.concatenate(labels4, 0)
440
+ # np.clip(labels4[:, 1:] - s / 2, 0, s, out=labels4[:, 1:])
441
+ # use with center crop
442
+ np.clip(labels4[:, 1:], 0, 2 * s, out=labels4[:, 1:])
443
+ # use with random_affine
444
+
445
+ # Augment
446
+ # img4 = img4[s // 2: int(s * 1.5), s // 2:int(s * 1.5)]
447
+ # center crop (WARNING, requires box pruning)
448
+ img4, labels4 = random_affine(img4, labels4,
449
+ degrees=self.hyp['degrees'],
450
+ translate=self.hyp['translate'],
451
+ scale=self.hyp['scale'],
452
+ shear=self.hyp['shear'],
453
+ border=-s // 2) # border to remove
454
+
455
+ return img4, labels4
456
+
457
+
458
+ def compute_ap(recall, precision):
459
+ """ Compute the average precision, given the recall and precision curves.
460
+ Source: https://github.com/rbgirshick/py-faster-rcnn.
461
+ # Arguments
462
+ recall: The recall curve (list).
463
+ precision: The precision curve (list).
464
+ # Returns
465
+ The average precision as computed in py-faster-rcnn.
466
+ """
467
+
468
+ # Append sentinel values to beginning and end
469
+ mrec = np.concatenate(([0.], recall, [min(recall[-1] + 1E-3, 1.)]))
470
+ mpre = np.concatenate(([0.], precision, [0.]))
471
+
472
+ # Compute the precision envelope
473
+ mpre = np.flip(np.maximum.accumulate(np.flip(mpre)))
474
+
475
+ # Integrate area under curve
476
+ method = 'interp' # methods: 'continuous', 'interp'
477
+ if method == 'interp':
478
+ x = np.linspace(0, 1, 101) # 101-point interp (COCO)
479
+ ap = np.trapz(np.interp(x, mrec, mpre), x) # integrate
480
+ else: # 'continuous'
481
+ # points where x axis (recall) changes
482
+ i = np.where(mrec[1:] != mrec[:-1])[0]
483
+ ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) # area under curve
484
+
485
+ return ap
486
+
487
+
488
+ def augment_hsv(img, hgain=0.5, sgain=0.5, vgain=0.5):
489
+ r = (np.random.uniform(-1, 1, 3) *
490
+ [hgain, sgain, vgain] + 1) # random gains
491
+ hue, sat, val = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV))
492
+ dtype = img.dtype # uint8
493
+
494
+ x = np.arange(0, 256, dtype=np.int16)
495
+ lut_hue = ((x * r[0]) % 180).astype(dtype)
496
+ lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
497
+ lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
498
+
499
+ img_hsv = cv2.merge((cv2.LUT(hue, lut_hue),
500
+ cv2.LUT(sat, lut_sat),
501
+ cv2.LUT(val, lut_val))).astype(dtype)
502
+ cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img)
503
+ # no return needed
504
+
505
+
506
+ class LoadImagesAndLabels(Dataset): # for training/testing
507
+ def __init__(
508
+ self,
509
+ path,
510
+ img_size=416,
511
+ batch_size=16,
512
+ augment=False,
513
+ hyp=None,
514
+ rect=False,
515
+ image_weights=False,
516
+ cache_images=False,
517
+ single_cls=False,
518
+ pad=0.0):
519
+ try:
520
+ path = str(Path(path)) # os-agnostic
521
+ parent = str(Path(path).parent) + os.sep
522
+ if os.path.isfile(path): # file
523
+ with open(path, 'r') as f:
524
+ f = f.read().splitlines()
525
+ # local to global path
526
+ f = [
527
+ x.replace(
528
+ './',
529
+ parent) if x.startswith('./') else x for x in f]
530
+ elif os.path.isdir(path): # folder
531
+ f = glob.iglob(path + os.sep + '*.*')
532
+ else:
533
+ raise Exception('%s does not exist' % path)
534
+ self.img_files = [x.replace(
535
+ '/', os.sep) for x in f if
536
+ os.path.splitext(x)[-1].lower() in img_formats]
537
+
538
+ except BaseException:
539
+ raise Exception(
540
+ 'Error loading data from %s. See %s' %
541
+ (path, help_url))
542
+
543
+ n = len(self.img_files)
544
+ assert n > 0, 'No images found in %s. See %s' % (path, help_url)
545
+ bi = np.floor(np.arange(n) / batch_size).astype(int) # batch index
546
+ nb = bi[-1] + 1 # number of batches
547
+
548
+ self.n = n # number of images
549
+ self.batch = bi # batch index of image
550
+ self.img_size = img_size
551
+ self.augment = augment
552
+ self.hyp = hyp
553
+ self.image_weights = image_weights
554
+ self.rect = False if image_weights else rect
555
+ # load 4 images at a time into a mosaic (only during training)
556
+ self.mosaic = self.augment and not self.rect
557
+
558
+ # Define labels
559
+ self.label_files = [x.replace('images', 'labels').replace(
560
+ os.path.splitext(x)[-1], '.txt') for x in self.img_files]
561
+
562
+ # Read image shapes (wh)
563
+ sp = path.replace('.txt', '') + '.shapes' # shapefile path
564
+ try:
565
+ with open(sp, 'r') as f: # read existing shapefile
566
+ s = [x.split() for x in f.read().splitlines()]
567
+ assert len(s) == n, 'Shapefile out of sync'
568
+ except BaseException:
569
+ s = [exif_size(Image.open(f)) for f in tqdm(
570
+ self.img_files,
571
+ desc='Reading image shapes')]
572
+ np.savetxt(sp, s, fmt='%g') # overwrites existing (if any)
573
+
574
+ self.shapes = np.array(s, dtype=np.float64)
575
+
576
+ # Rectangular Training
577
+ # https://github.com/ultralytics/yolov3/issues/232
578
+ if self.rect:
579
+ # Sort by aspect ratio
580
+ s = self.shapes # wh
581
+ ar = s[:, 1] / s[:, 0] # aspect ratio
582
+ irect = ar.argsort()
583
+ self.img_files = [self.img_files[i] for i in irect]
584
+ self.label_files = [self.label_files[i] for i in irect]
585
+ self.shapes = s[irect] # wh
586
+ ar = ar[irect]
587
+
588
+ # Set training image shapes
589
+ shapes = [[1, 1]] * nb
590
+ for i in range(nb):
591
+ ari = ar[bi == i]
592
+ mini, maxi = ari.min(), ari.max()
593
+ if maxi < 1:
594
+ shapes[i] = [maxi, 1]
595
+ elif mini > 1:
596
+ shapes[i] = [1, 1 / mini]
597
+
598
+ self.batch_shapes = np.ceil(
599
+ np.array(shapes) * img_size / 32. + pad).astype(int) * 32
600
+
601
+ # Cache labels
602
+ self.imgs = [None] * n
603
+ self.labels = [np.zeros((0, 5), dtype=np.float32)] * n
604
+ create_datasubset, extract_bounding_boxes, labels_loaded = \
605
+ False, False, False
606
+ # number missing, found, empty, datasubset, duplicate
607
+ nm, nf, ne, ns, nd = 0, 0, 0, 0, 0
608
+ # saved labels in *.npy file
609
+ np_labels_path = str(Path(self.label_files[0]).parent) + '.npy'
610
+ if os.path.isfile(np_labels_path):
611
+ s = np_labels_path # print string
612
+
613
+ print(np_labels_path)
614
+
615
+ x = np.load(np_labels_path, allow_pickle=True)
616
+ if len(x) == n:
617
+ self.labels = x
618
+ labels_loaded = True
619
+ else:
620
+ s = path.replace('images', 'labels')
621
+
622
+ pbar = tqdm(self.label_files)
623
+ for i, file in enumerate(pbar):
624
+ if labels_loaded:
625
+ l = self.labels[i]
626
+ # np.savetxt(file, l, '%g') # save *.txt from *.npy file
627
+ else:
628
+ try:
629
+ with open(file, 'r') as f:
630
+ l = np.array(
631
+ [x.split() for x in f.read().splitlines()],
632
+ dtype=np.float32)
633
+ except BaseException:
634
+ # print('missing labels for image %s' % self.img_files[i])
635
+ # # file missing
636
+ nm += 1
637
+ continue
638
+
639
+ if l.shape[0]:
640
+ assert l.shape[1] == 5, '> 5 label columns: %s' % file
641
+ assert (l >= 0).all(), 'negative labels: %s' % file
642
+ assert (l[:, 1:] <= 1).all(
643
+ ), 'non-normalized or out of bounds coordinate labels: %s' % file
644
+ if np.unique(
645
+ l, axis=0).shape[0] < l.shape[0]: # duplicate rows
646
+ # print('WARNING: duplicate rows in %s' %
647
+ # self.label_files[i]) # duplicate rows
648
+ nd += 1
649
+ if single_cls:
650
+ l[:, 0] = 0 # force dataset into single-class mode
651
+ self.labels[i] = l
652
+ nf += 1 # file found
653
+
654
+ # Create subdataset (a smaller dataset)
655
+ if create_datasubset and ns < 1E4:
656
+ if ns == 0:
657
+ create_folder(path='./datasubset')
658
+ os.makedirs('./datasubset/images')
659
+ exclude_classes = 43
660
+ if exclude_classes not in l[:, 0]:
661
+ ns += 1
662
+ # shutil.copy(src=self.img_files[i],
663
+ # dst='./datasubset/images/') # copy image
664
+ with open('./datasubset/images.txt', 'a') as f:
665
+ f.write(self.img_files[i] + '\n')
666
+
667
+ # Extract object detection boxes for a second stage classifier
668
+ if extract_bounding_boxes:
669
+ p = Path(self.img_files[i])
670
+ img = cv2.imread(str(p))
671
+ h, w = img.shape[:2]
672
+ for j, x in enumerate(l):
673
+ f = '%s%sclassifier%s%g_%g_%s' % (
674
+ p.parent.parent, os.sep, os.sep, x[0], j, p.name)
675
+ if not os.path.exists(Path(f).parent):
676
+ # make new output folder
677
+ os.makedirs(Path(f).parent)
678
+
679
+ b = x[1:] * [w, h, w, h] # box
680
+ b[2:] = b[2:].max() # rectangle to square
681
+ b[2:] = b[2:] * 1.3 + 30 # pad
682
+ b = xywh2xyxy(b.reshape(-1, 4)).ravel().astype(int)
683
+
684
+ # clip boxes outside of image
685
+ b[[0, 2]] = np.clip(b[[0, 2]], 0, w)
686
+ b[[1, 3]] = np.clip(b[[1, 3]], 0, h)
687
+ assert cv2.imwrite(
688
+ f, img[b[1]:b[3], b[0]:b[2]]), \
689
+ 'Failure extracting classifier boxes'
690
+ else:
691
+ # print('empty labels for image %s' % self.img_files[i]) #
692
+ # file empty
693
+ ne += 1
694
+ # os.system("rm '%s' '%s'" % (self.img_files[i],
695
+ # self.label_files[i])) # remove
696
+
697
+ pbar.desc = 'Caching labels %s (%g found, %g missing, %g empty,\
698
+ %g duplicate, for %g images)' % (
699
+ s, nf, nm, ne, nd, n)
700
+ assert nf > 0 or n == 20288, 'No labels found in %s. See %s' % (
701
+ os.path.dirname(file) + os.sep, help_url)
702
+ if not labels_loaded and n > 1000:
703
+ print(
704
+ 'Saving labels to %s for faster future loading' %
705
+ np_labels_path)
706
+ # np.save(np_labels_path, self.labels) # save for next time
707
+
708
+ # Cache images into memory for faster training (WARNING: large datasets
709
+ # may exceed system RAM)
710
+ if cache_images: # if training
711
+ gb = 0 # Gigabytes of cached images
712
+ pbar = tqdm(range(len(self.img_files)), desc='Caching images')
713
+ self.img_hw0, self.img_hw = [None] * n, [None] * n
714
+ for i in pbar: # max 10k images
715
+ self.imgs[i], self.img_hw0[i], self.img_hw[i] = load_image(
716
+ self, i) # img, hw_original, hw_resized
717
+ gb += self.imgs[i].nbytes
718
+ pbar.desc = 'Caching images (%.1fGB)' % (gb / 1E9)
719
+
720
+ def __len__(self):
721
+ return len(self.img_files)
722
+
723
+ def __getitem__(self, index):
724
+ if self.image_weights:
725
+ index = self.indices[index]
726
+
727
+ hyp = self.hyp
728
+ if self.mosaic:
729
+ # Load mosaic
730
+ img, labels = load_mosaic(self, index)
731
+ shapes = None
732
+
733
+ else:
734
+ # Load image
735
+ img, (h0, w0), (h, w) = load_image(self, index)
736
+
737
+ # Letterbox
738
+ shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size # final letterboxed shape
739
+ img, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)
740
+ shapes = (h0, w0), ((h / h0, w / w0), pad) # for COCO mAP rescaling
741
+
742
+ # Load labels
743
+ labels = []
744
+ x = self.labels[index]
745
+ if x.size > 0:
746
+ # Normalized xywh to pixel xyxy format
747
+ labels = x.copy()
748
+ labels[:, 1] = ratio[0] * w * (x[:, 1] - x[:, 3] / 2) + pad[0] # pad width
749
+ labels[:, 2] = ratio[1] * h * (x[:, 2] - x[:, 4] / 2) + pad[1] # pad height
750
+ labels[:, 3] = ratio[0] * w * (x[:, 1] + x[:, 3] / 2) + pad[0]
751
+ labels[:, 4] = ratio[1] * h * (x[:, 2] + x[:, 4] / 2) + pad[1]
752
+
753
+ if self.augment:
754
+ # Augment imagespace
755
+ if not self.mosaic:
756
+ img, labels = random_affine(img, labels,
757
+ degrees=hyp['degrees'],
758
+ translate=hyp['translate'],
759
+ scale=hyp['scale'],
760
+ shear=hyp['shear'])
761
+
762
+ # Augment colorspace
763
+ augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v'])
764
+
765
+ # Apply cutouts
766
+ # if random.random() < 0.9:
767
+ # labels = cutout(img, labels)
768
+
769
+ nL = len(labels) # number of labels
770
+ if nL:
771
+ # convert xyxy to xywh
772
+ labels[:, 1:5] = xyxy2xywh(labels[:, 1:5])
773
+
774
+ # Normalize coordinates 0 - 1
775
+ labels[:, [2, 4]] /= img.shape[0] # height
776
+ labels[:, [1, 3]] /= img.shape[1] # width
777
+
778
+ if self.augment:
779
+ # random left-right flip
780
+ lr_flip = True
781
+ if lr_flip and random.random() < 0.5:
782
+ img = np.fliplr(img)
783
+ if nL:
784
+ labels[:, 1] = 1 - labels[:, 1]
785
+
786
+ # random up-down flip
787
+ ud_flip = False
788
+ if ud_flip and random.random() < 0.5:
789
+ img = np.flipud(img)
790
+ if nL:
791
+ labels[:, 2] = 1 - labels[:, 2]
792
+
793
+ labels_out = torch.zeros((nL, 6))
794
+ if nL:
795
+ labels_out[:, 1:] = torch.from_numpy(labels)
796
+
797
+ # Convert
798
+ img = img[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416
799
+ img = np.ascontiguousarray(img)
800
+
801
+ return torch.from_numpy(img), labels_out, self.img_files[index], shapes
802
+
803
+ @staticmethod
804
+ def collate_fn(batch):
805
+ img, label, path, shapes = zip(*batch) # transposed
806
+ for i, l in enumerate(label):
807
+ l[:, 0] = i # add target image index for build_targets()
808
+ return torch.stack(img, 0), torch.cat(label, 0), path, shapes
809
+
810
+
811
+ def parse_data_cfg(path):
812
+ # Parses the data configuration file
813
+ if not os.path.exists(path) and os.path.exists(
814
+ 'data' + os.sep + path): # add data/ prefix if omitted
815
+ path = 'data' + os.sep + path
816
+
817
+ with open(path, 'r') as f:
818
+ lines = f.readlines()
819
+
820
+ options = dict()
821
+ for line in lines:
822
+ line = line.strip()
823
+ if line == '' or line.startswith('#'):
824
+ continue
825
+ key, val = line.split('=')
826
+ options[key.strip()] = val.strip()
827
+
828
+ return options
829
+
830
+
831
+ def create_grids(ng=(13, 13), device='cpu'):
832
+ nx, ny = ng # x and y grid size
833
+ ng = torch.tensor(ng, dtype=torch.float)
834
+
835
+ # build xy offsets
836
+ yv, xv = torch.meshgrid(
837
+ [torch.arange(ny, device=device), torch.arange(nx, device=device)])
838
+ grid = torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
839
+
840
+ return grid
841
+
842
+
843
+ def post_process(x):
844
+ stride = [32, 16, 8]
845
+ anchors = [[10, 13, 16, 30, 33, 23],
846
+ [30, 61, 62, 45, 59, 119],
847
+ [116, 90, 156, 198, 373, 326]]
848
+ temp = [13, 26, 52]
849
+
850
+ res = []
851
+ for i in range(3):
852
+ out = torch.from_numpy(x[i]) if not torch.is_tensor(x[i]) else x[i]
853
+
854
+ bs, _, ny, nx = out.shape # bs, 255, 13, 13
855
+
856
+ anchor = torch.Tensor(anchors[2 - i]).reshape(3, 2)
857
+ anchor_vec = anchor / stride[i]
858
+ anchor_wh = anchor_vec.view(1, 3, 1, 1, 2)
859
+
860
+ grid = create_grids((nx, ny))
861
+
862
+ # p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85) # (bs, anchors,
863
+ # grid, grid, classes + xywh)
864
+ out = out.view(
865
+ bs, 3, 85, temp[i], temp[i]).permute(
866
+ 0, 1, 3, 4, 2).contiguous() # prediction
867
+
868
+ io = out.clone() # inference output
869
+
870
+ io[..., :2] = torch.sigmoid(io[..., :2]) + grid # xy
871
+ io[..., 2:4] = torch.exp(io[..., 2:4]) * anchor_wh # wh yolo method
872
+ io[..., :4] *= stride[i]
873
+ torch.sigmoid_(io[..., 4:])
874
+
875
+ res.append(io.view(bs, -1, 85))
876
+ return torch.cat(res, 1), x
877
+
878
+
879
+ def test(data,
880
+ batch_size=32,
881
+ imgsz=416,
882
+ conf_thres=0.001,
883
+ iou_thres=0.6, # for nms
884
+ save_json=False,
885
+ single_cls=False,
886
+ augment=False,
887
+ model=None,
888
+ dataloader=None,
889
+ multi_label=True,
890
+ names='data/coco.names',
891
+ onnx_runtime=True,
892
+ onnx_weights="yolov3-8",
893
+ ipu=False,
894
+ provider_config='vaip_config.json'):
895
+ """
896
+ COCO average precision (AP) Evaluation. Iterate inference on the test dataset
897
+ and the results are evaluated by COCO API.
898
+ """
899
+
900
+ device = torch.device('cpu')
901
+ verbose = False
902
+ if isinstance(onnx_weights, list):
903
+ onnx_weights = onnx_weights[0]
904
+
905
+ if ipu:
906
+ providers = ["VitisAIExecutionProvider"]
907
+ provider_options = [{"config_file": provider_config}]
908
+ else:
909
+ providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
910
+ provider_options = None
911
+
912
+ onnx_model = onnxruntime.InferenceSession(
913
+ onnx_weights,
914
+ providers=providers,
915
+ provider_options=provider_options)
916
+
917
+ # Configure run
918
+ data = parse_data_cfg(data)
919
+ nc = 1 if single_cls else int(data['classes']) # number of classes
920
+ path = data['valid'] # path to test images
921
+ names = load_classes(data['names']) # class names
922
+ iouv = torch.linspace(0.5, 0.95, 10).to(
923
+ device) # iou vector for mAP@0.5:0.95
924
+ iouv = iouv[0].view(1) # comment for mAP@0.5:0.95
925
+ niou = iouv.numel()
926
+
927
+ # Dataloader
928
+ if dataloader is None:
929
+ dataset = LoadImagesAndLabels(
930
+ path,
931
+ imgsz,
932
+ batch_size,
933
+ rect=False,
934
+ single_cls=opt.single_cls,
935
+ pad=0.5)
936
+ batch_size = min(batch_size, len(dataset))
937
+ dataloader = DataLoader(dataset,
938
+ batch_size=batch_size,
939
+ num_workers=min([os.cpu_count(),
940
+ batch_size if
941
+ batch_size > 1 else 0,
942
+ 8]),
943
+ pin_memory=True,
944
+ collate_fn=dataset.collate_fn)
945
+
946
+ seen = 0
947
+
948
+ coco91class = coco80_to_coco91_class()
949
+ s = ('%20s' + '%10s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R',
950
+ 'mAP@0.5', 'F1')
951
+ p, r, f1, mp, mr, map, mf1, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0.
952
+ loss = torch.zeros(3, device=device)
953
+ jdict, stats, ap, ap_class = [], [], [], []
954
+
955
+ for batch_i, (imgs, targets, paths, shapes) in enumerate(
956
+ tqdm(dataloader, desc=s)):
957
+ # uint8 to float32, 0 - 255 to 0.0 - 1.0
958
+ imgs = imgs.to(device).float() / 255.0
959
+ targets = targets.to(device)
960
+ nb, _, height, width = imgs.shape
961
+ # batch size, channels, height, width
962
+ whwh = torch.Tensor([width, height, width, height]).to(device)
963
+
964
+ if onnx_runtime:
965
+ outputs = onnx_model.run(
966
+ None, {onnx_model.get_inputs()[0].name: imgs.cpu().numpy()})
967
+ outputs = [torch.tensor(item).to(device) for item in outputs]
968
+ inf_out, train_out = post_process(outputs)
969
+
970
+ else:
971
+
972
+ # Disable gradients
973
+ with torch.no_grad():
974
+ # Run model
975
+ t = time_synchronized()
976
+
977
+ # inference and training outputs
978
+ inf_out, train_out = model(imgs, augment=augment)
979
+ t0 += time_synchronized() - t
980
+
981
+ # Compute loss
982
+ # if is_training: # if model has loss hyperparameters
983
+ # loss += compute_loss(train_out, targets, model)[1][:3] # GIoU,
984
+ # obj, cls
985
+
986
+ # Run NMS
987
+ t = time_synchronized()
988
+ output = non_max_suppression(
989
+ inf_out,
990
+ conf_thres=conf_thres,
991
+ iou_thres=iou_thres,
992
+ multi_label=multi_label)
993
+ t1 += time_synchronized() - t
994
+
995
+ # Statistics per image
996
+ for si, pred in enumerate(output):
997
+ labels = targets[targets[:, 0] == si, 1:]
998
+ nl = len(labels)
999
+ tcls = labels[:, 0].tolist() if nl else [] # target class
1000
+ seen += 1
1001
+
1002
+ if pred is None:
1003
+ if nl:
1004
+ stats.append(
1005
+ (torch.zeros(
1006
+ 0,
1007
+ niou,
1008
+ dtype=torch.bool),
1009
+ torch.Tensor(),
1010
+ torch.Tensor(),
1011
+ tcls))
1012
+ continue
1013
+
1014
+ # Append to text file
1015
+ # with open('test.txt', 'a') as file:
1016
+ # [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred]
1017
+
1018
+ # Clip boxes to image bounds
1019
+ clip_coords(pred, (height, width))
1020
+
1021
+ # Append to pycocotools JSON dictionary
1022
+ if save_json:
1023
+ image_id = int(Path(paths[si]).stem.split('_')[-1])
1024
+ box = pred[:, :4].clone() # xyxy
1025
+ scale_coords(imgs[si].shape[1:], box, shapes[si]
1026
+ [0], shapes[si][1]) # to original shape
1027
+ box = xyxy2xywh(box) # xywh
1028
+ box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner
1029
+ for p, b in zip(pred.tolist(), box.tolist()):
1030
+ jdict.append({'image_id': image_id,
1031
+ 'category_id': coco91class[int(p[5])],
1032
+ 'bbox': [round(x, 3) for x in b],
1033
+ 'score': round(p[4], 5)})
1034
+
1035
+ # Assign all predictions as incorrect
1036
+ correct = torch.zeros(
1037
+ pred.shape[0],
1038
+ niou,
1039
+ dtype=torch.bool,
1040
+ device=device)
1041
+ if nl:
1042
+ detected = [] # target indices
1043
+ tcls_tensor = labels[:, 0]
1044
+
1045
+ # target boxes
1046
+ tbox = xywh2xyxy(labels[:, 1:5]) * whwh
1047
+
1048
+ # Per target class
1049
+ for cls in torch.unique(tcls_tensor):
1050
+ ti = (cls == tcls_tensor).nonzero(
1051
+ ).view(-1) # target indices
1052
+ pi = (cls == pred[:, 5]).nonzero(
1053
+ ).view(-1) # prediction indices
1054
+
1055
+ # Search for detections
1056
+ if pi.shape[0]:
1057
+ # Prediction to target ious
1058
+ ious, i = box_iou(pred[pi, :4], tbox[ti].cpu()).max(
1059
+ 1) # best ious, indices
1060
+
1061
+ # Append detections
1062
+ for j in (ious > iouv[0].cpu()).nonzero():
1063
+ d = ti[i[j]] # detected target
1064
+ if d not in detected:
1065
+ detected.append(d)
1066
+ # iou_thres is 1xn
1067
+ correct[pi[j]] = ious[j] > iouv.cpu()
1068
+ if len(
1069
+ detected) == nl:
1070
+ # all targets already located in image
1071
+ break
1072
+
1073
+ # Append statistics (correct, conf, pcls, tcls)
1074
+ stats.append(
1075
+ (correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))
1076
+
1077
+ # Plot images
1078
+ if batch_i < 1:
1079
+ f = 'test_batch%g_gt.jpg' % batch_i # filename
1080
+ plot_images(imgs, targets, paths=paths, names=names,
1081
+ fname=f) # ground truth
1082
+ f = 'test_batch%g_pred.jpg' % batch_i
1083
+ plot_images(imgs, output_to_target(output, width, height),
1084
+ paths=paths, names=names, fname=f) # predictions
1085
+
1086
+ # test end
1087
+
1088
+ # Compute statistics
1089
+ stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy
1090
+ if len(stats):
1091
+ p, r, ap, f1, ap_class = ap_per_class(*stats)
1092
+ if niou > 1:
1093
+ p, r, ap, f1 = p[:, 0], r[:, 0], ap.mean(
1094
+ 1), ap[:, 0] # [P, R, AP@0.5:0.95, AP@0.5]
1095
+ mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean()
1096
+ nt = np.bincount(stats[3].astype(np.int64),
1097
+ minlength=nc) # number of targets per class
1098
+ else:
1099
+ nt = torch.zeros(1)
1100
+
1101
+ # Print results
1102
+ pf = '%20s' + '%10.3g' * 6 # print format
1103
+ print(pf % ('all', seen, nt.sum(), mp, mr, map, mf1))
1104
+
1105
+ # Print results per class
1106
+ if verbose and nc > 1 and len(stats):
1107
+ for i, c in enumerate(ap_class):
1108
+ print(pf % (names[c], seen, nt[c], p[i], r[i], ap[i], f1[i]))
1109
+
1110
+ # Print speeds
1111
+ if verbose or save_json:
1112
+ t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + \
1113
+ (imgsz, imgsz, batch_size) # tuple
1114
+ print(
1115
+ 'Speed: %.1f/%.1f/%.1f ms \
1116
+ inference/NMS/total per %gx%g image at batch-size %g' % t)
1117
+
1118
+ # Save JSON
1119
+ if save_json and map and len(jdict):
1120
+ print('\nCOCO mAP with pycocotools...')
1121
+ imgIds = [int(Path(x).stem.split('_')[-1])
1122
+ for x in dataloader.dataset.img_files]
1123
+ with open('results.json', 'w') as file:
1124
+ json.dump(jdict, file)
1125
+
1126
+ try:
1127
+ from pycocotools.coco import COCO
1128
+ from pycocotools.cocoeval import COCOeval
1129
+
1130
+ # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
1131
+ # initialize COCO ground truth api
1132
+ cocoGt = COCO(
1133
+ glob.glob('coco/annotations/instances_val*.json')[0])
1134
+ cocoDt = cocoGt.loadRes('results.json') # initialize COCO pred api
1135
+ cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')
1136
+ # [:32] # only evaluate these images
1137
+ cocoEval.params.imgIds = imgIds
1138
+ cocoEval.evaluate()
1139
+ cocoEval.accumulate()
1140
+ cocoEval.summarize()
1141
+ # mf1, map = cocoEval.stats[:2] # update to pycocotools results
1142
+ # (mAP@0.5:0.95, mAP@0.5)
1143
+ except BaseException:
1144
+ print(
1145
+ 'WARNING: pycocotools must be installed with \
1146
+ numpy==1.17 to run correctly. '
1147
+ 'See https://github.com/cocodataset/cocoapi/issues/356')
1148
+
1149
+ # Return results
1150
+ maps = np.zeros(nc) + map
1151
+ for i, c in enumerate(ap_class):
1152
+ maps[c] = ap[i]
1153
+ return (mp, mr, map, mf1, *(loss.cpu() / len(dataloader)).tolist()), maps
1154
+
1155
+
1156
+ if __name__ == '__main__':
1157
+ parser = argparse.ArgumentParser(prog='Test onnx model performance on COCO dataset')
1158
+ parser.add_argument(
1159
+ '--data',
1160
+ type=str,
1161
+ default='coco2017.data',
1162
+ help='Path of *.data')
1163
+ parser.add_argument(
1164
+ '--batch-size',
1165
+ type=int,
1166
+ default=1,
1167
+ help='Size of each image batch')
1168
+ parser.add_argument(
1169
+ '--img-size',
1170
+ type=int,
1171
+ default=416,
1172
+ help='Inference size (pixels)')
1173
+ parser.add_argument(
1174
+ '--conf-thres',
1175
+ type=float,
1176
+ default=0.001,
1177
+ help='Object confidence threshold')
1178
+ parser.add_argument(
1179
+ '--iou-thres',
1180
+ type=float,
1181
+ default=0.5,
1182
+ help='IOU threshold for NMS')
1183
+ parser.add_argument(
1184
+ '--save-json',
1185
+ action='store_true',
1186
+ help='Save a COCOapi-compatible JSON results file')
1187
+ parser.add_argument(
1188
+ '--device',
1189
+ default='',
1190
+ help='Device id (i.e. 0 or 0,1) or cpu')
1191
+ parser.add_argument(
1192
+ '--augment',
1193
+ action='store_true',
1194
+ help='Augmented inference')
1195
+ parser.add_argument('--sync_bn', action='store_true')
1196
+ parser.add_argument('--print_model', action='store_true')
1197
+ parser.add_argument('--test_rect', action='store_true')
1198
+
1199
+ parser.add_argument(
1200
+ '--onnx_runtime',
1201
+ action='store_true',
1202
+ help='Use onnx runtime')
1203
+ parser.add_argument(
1204
+ '--onnx_weights',
1205
+ default='yolov3-8.onnx',
1206
+ nargs='+',
1207
+ type=str,
1208
+ help='Path of onnx weights')
1209
+ parser.add_argument(
1210
+ '--single-cls',
1211
+ action='store_true',
1212
+ help='Run as single-class dataset')
1213
+ parser.add_argument(
1214
+ "--ipu",
1215
+ action="store_true",
1216
+ help="Use IPU for inference")
1217
+ parser.add_argument(
1218
+ "--provider_config",
1219
+ type=str,
1220
+ default="vaip_config.json",
1221
+ help="Path of the config file for seting provider_options")
1222
+
1223
+ opt = parser.parse_args()
1224
+ opt.save_json = opt.save_json or any(
1225
+ [x in opt.data for x in ['coco.data',
1226
+ 'coco2014.data', 'coco2017.data']])
1227
+ opt.data = check_file(opt.data) # check file
1228
+ print(opt)
1229
+
1230
+ help_url = 'https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data'
1231
+ img_formats = ['.bmp', '.jpg', '.jpeg', '.png', '.tif', '.tiff', '.dng']
1232
+ vid_formats = ['.mov', '.avi', '.mp4', '.mpg', '.mpeg', '.m4v', '.wmv',
1233
+ '.mkv']
1234
+
1235
+ test(opt.data,
1236
+ opt.batch_size,
1237
+ opt.img_size,
1238
+ opt.conf_thres,
1239
+ opt.iou_thres,
1240
+ opt.save_json,
1241
+ opt.single_cls,
1242
+ opt.augment,
1243
+ names='data/coco.names',
1244
+ onnx_weights=opt.onnx_weights,
1245
+ ipu=opt.ipu,
1246
+ provider_config=opt.provider_config
1247
+ )
requirements.txt ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # pip install -r requirements.txt
2
+
3
+ # base ----------------------------------------
4
+ Cython
5
+ matplotlib>=3.2.2
6
+ numpy>=1.18.5
7
+ opencv-python>=4.1.2
8
+ pillow
9
+ PyYAML>=5.3
10
+ scipy>=1.4.1
11
+ tensorboard>=2.2
12
+ torch==1.12.0
13
+ torchvision>=0.7.0
14
+ tqdm>=4.41.0
15
+ pandas
16
+ #onnxruntime
17
+
18
+ # coco ----------------------------------------
19
+ pycocotools>=2.0
20
+
21
+ # export --------------------------------------
22
+ # packaging # for coremltools
23
+ # coremltools==4.0b3
24
+ # onnx>=1.7.0
25
+ # scikit-learn==0.19.2 # for coreml quantization
26
+
27
+ # extras --------------------------------------
28
+ # thop # FLOPS computation
29
+ # seaborn # plotting
utils.py ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import cv2
3
+ import torch
4
+ import time
5
+ import torchvision
6
+ import random
7
+
8
+
9
+ def box_iou(box1, box2):
10
+ # https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py
11
+ """
12
+ Return intersection-over-union (Jaccard index) of boxes.
13
+ Both sets of boxes are expected to be in (x1, y1, x2, y2) format.
14
+ Arguments:
15
+ box1 (Tensor[N, 4])
16
+ box2 (Tensor[M, 4])
17
+ Returns:
18
+ iou (Tensor[N, M]): the NxM matrix containing the pairwise
19
+ IoU values for every element in boxes1 and boxes2
20
+ """
21
+
22
+ def box_area(box):
23
+ # box = 4xn
24
+ return (box[2] - box[0]) * (box[3] - box[1])
25
+
26
+ area1 = box_area(box1.T)
27
+ area2 = box_area(box2.T)
28
+
29
+ # inter(N,M) = (rb(N,M,2) - lt(N,M,2)).clamp(0).prod(2)
30
+ inter = (torch.min(box1[:, None, 2:], box2[:, 2:]) - torch.max(box1[:, None, :2], box2[:, :2])).clamp(0).prod(2)
31
+ return inter / (area1[:, None] + area2 - inter) # iou = inter / (area1 + area2 - inter)
32
+
33
+
34
+ def plot_one_box(x, image, color=None, label=None, line_thickness=None):
35
+ # Plots one bounding box on image img
36
+ tl = line_thickness or round(
37
+ 0.002 * (image.shape[0] + image.shape[1]) / 2) + 1 # line/font thickness
38
+ color = color or [random.randint(0, 255) for _ in range(3)]
39
+ c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
40
+ cv2.rectangle(image, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
41
+ if label:
42
+ tf = max(tl - 1, 1) # font thickness
43
+ t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
44
+ c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
45
+ cv2.rectangle(image, c1, c2, color, -1, cv2.LINE_AA) # filled
46
+ cv2.putText(image, label, (c1[0], c1[1] - 2), 0, tl / 3,
47
+ [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)
48
+
49
+
50
+ def clip_coords(boxes, img_shape):
51
+ # Clip bounding xyxy bounding boxes to image shape (height, width)
52
+ boxes[:, 0].clamp_(0, img_shape[1]) # x1
53
+ boxes[:, 1].clamp_(0, img_shape[0]) # y1
54
+ boxes[:, 2].clamp_(0, img_shape[1]) # x2
55
+ boxes[:, 3].clamp_(0, img_shape[0]) # y2
56
+
57
+
58
+ def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):
59
+ # Rescale coords (xyxy) from img1_shape to img0_shape
60
+ if ratio_pad is None: # calculate from img0_shape
61
+ gain = max(img1_shape) / max(img0_shape) # gain = old / new
62
+ pad = (img1_shape[1] - img0_shape[1] * gain) / \
63
+ 2, (img1_shape[0] - img0_shape[0] * gain) / 2 # wh padding
64
+ else:
65
+ gain = ratio_pad[0][0]
66
+ pad = ratio_pad[1]
67
+
68
+ coords[:, [0, 2]] -= pad[0] # x padding
69
+ coords[:, [1, 3]] -= pad[1] # y padding
70
+ coords[:, :4] /= gain
71
+ clip_coords(coords, img0_shape)
72
+ return coords
73
+
74
+
75
+ def xywh2xyxy(x):
76
+ # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where
77
+ # xy1=top-left, xy2=bottom-right
78
+ y = torch.zeros_like(x) if isinstance(
79
+ x, torch.Tensor) else np.zeros_like(x)
80
+ y[:, 0] = x[:, 0] - x[:, 2] / 2 # top left x
81
+ y[:, 1] = x[:, 1] - x[:, 3] / 2 # top left y
82
+ y[:, 2] = x[:, 0] + x[:, 2] / 2 # bottom right x
83
+ y[:, 3] = x[:, 1] + x[:, 3] / 2 # bottom right y
84
+ return y
85
+
86
+
87
+ def letterbox(img, new_shape=(416, 416), color=(114, 114, 114), auto=True,
88
+ scaleFill=False, scaleup=True):
89
+ # Resize image to a 32-pixel-multiple rectangle
90
+ # https://github.com/ultralytics/yolov3/issues/232
91
+ shape = img.shape[:2] # current shape [height, width]
92
+ if isinstance(new_shape, int):
93
+ new_shape = (new_shape, new_shape)
94
+
95
+ # Scale ratio (new / old)
96
+ r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
97
+ if not scaleup: # only scale down, do not scale up (for better test mAP)
98
+ r = min(r, 1.0)
99
+
100
+ # Compute padding
101
+ ratio = r, r # width, height ratios
102
+ new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
103
+ dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - \
104
+ new_unpad[1] # wh padding
105
+ if auto: # minimum rectangle
106
+ dw, dh = np.mod(dw, 32), np.mod(dh, 32) # wh padding
107
+ elif scaleFill: # stretch
108
+ dw, dh = 0.0, 0.0
109
+ new_unpad = new_shape
110
+ ratio = new_shape[0] / shape[1], new_shape[1] / \
111
+ shape[0] # width, height ratios
112
+
113
+ dw /= 2 # divide padding into 2 sides
114
+ dh /= 2
115
+
116
+ if shape[::-1] != new_unpad: # resize
117
+ img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
118
+ top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
119
+ left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
120
+ img = cv2.copyMakeBorder(img, top, bottom, left, right,
121
+ cv2.BORDER_CONSTANT, value=color) # add border
122
+ return img, ratio, (dw, dh)
123
+
124
+
125
+ def non_max_suppression(
126
+ prediction,
127
+ conf_thres=0.1,
128
+ iou_thres=0.6,
129
+ multi_label=True,
130
+ classes=None,
131
+ agnostic=False):
132
+ """
133
+ Performs Non-Maximum Suppression on inference results
134
+ Returns detections with shape:
135
+ nx6 (x1, y1, x2, y2, conf, cls)
136
+ """
137
+
138
+ # Settings
139
+ merge = True # merge for best mAP
140
+ # (pixels) minimum and maximum box width and height
141
+ min_wh, max_wh = 2, 4096
142
+ time_limit = 10.0 # seconds to quit after
143
+
144
+ t = time.time()
145
+ nc = prediction[0].shape[1] - 5 # number of classes
146
+ multi_label &= nc > 1 # multiple labels per box
147
+ output = [None] * prediction.shape[0]
148
+ for xi, x in enumerate(prediction): # image index, image inference
149
+ # Apply constraints
150
+ x = x[x[:, 4] > conf_thres] # confidence
151
+ x = x[((x[:, 2:4] > min_wh) & (x[:, 2:4] < max_wh)).all(1)]
152
+
153
+ # If none remain process next image
154
+ if not x.shape[0]:
155
+ continue
156
+
157
+ # Compute conf
158
+ x[..., 5:] *= x[..., 4:5] # conf = obj_conf * cls_conf
159
+
160
+ # Box (center x, center y, width, height) to (x1, y1, x2, y2)
161
+ box = xywh2xyxy(x[:, :4])
162
+
163
+ # Detections matrix nx6 (xyxy, conf, cls)
164
+ if multi_label:
165
+ i, j = (x[:, 5:] > conf_thres).nonzero().t()
166
+ x = torch.cat((box[i], x[i, j + 5].unsqueeze(1),
167
+ j.float().unsqueeze(1)), 1)
168
+ else: # best class only
169
+ conf, j = x[:, 5:].max(1)
170
+ x = torch.cat(
171
+ (box, conf.unsqueeze(1), j.float().unsqueeze(1)), 1)[
172
+ conf > conf_thres]
173
+
174
+ # Filter by class
175
+ if classes:
176
+ x = x[(j.view(-1, 1) == torch.tensor(classes,
177
+ device=j.device)).any(1)]
178
+
179
+ # Apply finite constraint
180
+ # if not torch.isfinite(x).all():
181
+ # x = x[torch.isfinite(x).all(1)]
182
+
183
+ # If none remain process next image
184
+ n = x.shape[0] # number of boxes
185
+ if not n:
186
+ continue
187
+
188
+ # Sort by confidence
189
+ # x = x[x[:, 4].argsort(descending=True)]
190
+
191
+ # Batched NMS
192
+ c = x[:, 5] * 0 if agnostic else x[:, 5] # classes
193
+ boxes, scores = x[:, :4].clone() + c.view(-1, 1) * \
194
+ max_wh, x[:, 4] # boxes (offset by class), scores
195
+ i = torchvision.ops.boxes.nms(boxes, scores, iou_thres)
196
+ if merge and (
197
+ 1 < n < 3E3): # Merge NMS (boxes merged using weighted mean)
198
+ try: # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
199
+ iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix
200
+ weights = iou * scores[None] # box weights
201
+ x[i, :4] = torch.mm(weights, x[:, :4]).float(
202
+ ) / weights.sum(1, keepdim=True) # merged boxes
203
+ # i = i[iou.sum(1) > 1] # require redundancy
204
+ except BaseException:
205
+ # https://github.com/ultralytics/yolov3/issues/1139
206
+ # print(x, i, x.shape, i.shape)
207
+ pass
208
+
209
+ output[xi] = x[i]
210
+ if (time.time() - t) > time_limit:
211
+ break # time limit exceeded
212
+
213
+ return output
yolov3-8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:caf441af4ff1b82258a7e308a89343f0e5ef9440e89383dde019e030a5b698f2
3
+ size 247866093
yolov3.cfg ADDED
@@ -0,0 +1,788 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [net]
2
+ # Testing
3
+ #batch=1
4
+ #subdivisions=1
5
+ # Training
6
+ batch=16
7
+ subdivisions=1
8
+ width=416
9
+ height=416
10
+ channels=3
11
+ momentum=0.9
12
+ decay=0.0005
13
+ angle=0
14
+ saturation = 1.5
15
+ exposure = 1.5
16
+ hue=.1
17
+
18
+ learning_rate=0.001
19
+ burn_in=1000
20
+ max_batches = 500200
21
+ policy=steps
22
+ steps=400000,450000
23
+ scales=.1,.1
24
+
25
+ [convolutional]
26
+ batch_normalize=1
27
+ filters=32
28
+ size=3
29
+ stride=1
30
+ pad=1
31
+ activation=leaky
32
+
33
+ # Downsample
34
+
35
+ [convolutional]
36
+ batch_normalize=1
37
+ filters=64
38
+ size=3
39
+ stride=2
40
+ pad=1
41
+ activation=leaky
42
+
43
+ [convolutional]
44
+ batch_normalize=1
45
+ filters=32
46
+ size=1
47
+ stride=1
48
+ pad=1
49
+ activation=leaky
50
+
51
+ [convolutional]
52
+ batch_normalize=1
53
+ filters=64
54
+ size=3
55
+ stride=1
56
+ pad=1
57
+ activation=leaky
58
+
59
+ [shortcut]
60
+ from=-3
61
+ activation=linear
62
+
63
+ # Downsample
64
+
65
+ [convolutional]
66
+ batch_normalize=1
67
+ filters=128
68
+ size=3
69
+ stride=2
70
+ pad=1
71
+ activation=leaky
72
+
73
+ [convolutional]
74
+ batch_normalize=1
75
+ filters=64
76
+ size=1
77
+ stride=1
78
+ pad=1
79
+ activation=leaky
80
+
81
+ [convolutional]
82
+ batch_normalize=1
83
+ filters=128
84
+ size=3
85
+ stride=1
86
+ pad=1
87
+ activation=leaky
88
+
89
+ [shortcut]
90
+ from=-3
91
+ activation=linear
92
+
93
+ [convolutional]
94
+ batch_normalize=1
95
+ filters=64
96
+ size=1
97
+ stride=1
98
+ pad=1
99
+ activation=leaky
100
+
101
+ [convolutional]
102
+ batch_normalize=1
103
+ filters=128
104
+ size=3
105
+ stride=1
106
+ pad=1
107
+ activation=leaky
108
+
109
+ [shortcut]
110
+ from=-3
111
+ activation=linear
112
+
113
+ # Downsample
114
+
115
+ [convolutional]
116
+ batch_normalize=1
117
+ filters=256
118
+ size=3
119
+ stride=2
120
+ pad=1
121
+ activation=leaky
122
+
123
+ [convolutional]
124
+ batch_normalize=1
125
+ filters=128
126
+ size=1
127
+ stride=1
128
+ pad=1
129
+ activation=leaky
130
+
131
+ [convolutional]
132
+ batch_normalize=1
133
+ filters=256
134
+ size=3
135
+ stride=1
136
+ pad=1
137
+ activation=leaky
138
+
139
+ [shortcut]
140
+ from=-3
141
+ activation=linear
142
+
143
+ [convolutional]
144
+ batch_normalize=1
145
+ filters=128
146
+ size=1
147
+ stride=1
148
+ pad=1
149
+ activation=leaky
150
+
151
+ [convolutional]
152
+ batch_normalize=1
153
+ filters=256
154
+ size=3
155
+ stride=1
156
+ pad=1
157
+ activation=leaky
158
+
159
+ [shortcut]
160
+ from=-3
161
+ activation=linear
162
+
163
+ [convolutional]
164
+ batch_normalize=1
165
+ filters=128
166
+ size=1
167
+ stride=1
168
+ pad=1
169
+ activation=leaky
170
+
171
+ [convolutional]
172
+ batch_normalize=1
173
+ filters=256
174
+ size=3
175
+ stride=1
176
+ pad=1
177
+ activation=leaky
178
+
179
+ [shortcut]
180
+ from=-3
181
+ activation=linear
182
+
183
+ [convolutional]
184
+ batch_normalize=1
185
+ filters=128
186
+ size=1
187
+ stride=1
188
+ pad=1
189
+ activation=leaky
190
+
191
+ [convolutional]
192
+ batch_normalize=1
193
+ filters=256
194
+ size=3
195
+ stride=1
196
+ pad=1
197
+ activation=leaky
198
+
199
+ [shortcut]
200
+ from=-3
201
+ activation=linear
202
+
203
+
204
+ [convolutional]
205
+ batch_normalize=1
206
+ filters=128
207
+ size=1
208
+ stride=1
209
+ pad=1
210
+ activation=leaky
211
+
212
+ [convolutional]
213
+ batch_normalize=1
214
+ filters=256
215
+ size=3
216
+ stride=1
217
+ pad=1
218
+ activation=leaky
219
+
220
+ [shortcut]
221
+ from=-3
222
+ activation=linear
223
+
224
+ [convolutional]
225
+ batch_normalize=1
226
+ filters=128
227
+ size=1
228
+ stride=1
229
+ pad=1
230
+ activation=leaky
231
+
232
+ [convolutional]
233
+ batch_normalize=1
234
+ filters=256
235
+ size=3
236
+ stride=1
237
+ pad=1
238
+ activation=leaky
239
+
240
+ [shortcut]
241
+ from=-3
242
+ activation=linear
243
+
244
+ [convolutional]
245
+ batch_normalize=1
246
+ filters=128
247
+ size=1
248
+ stride=1
249
+ pad=1
250
+ activation=leaky
251
+
252
+ [convolutional]
253
+ batch_normalize=1
254
+ filters=256
255
+ size=3
256
+ stride=1
257
+ pad=1
258
+ activation=leaky
259
+
260
+ [shortcut]
261
+ from=-3
262
+ activation=linear
263
+
264
+ [convolutional]
265
+ batch_normalize=1
266
+ filters=128
267
+ size=1
268
+ stride=1
269
+ pad=1
270
+ activation=leaky
271
+
272
+ [convolutional]
273
+ batch_normalize=1
274
+ filters=256
275
+ size=3
276
+ stride=1
277
+ pad=1
278
+ activation=leaky
279
+
280
+ [shortcut]
281
+ from=-3
282
+ activation=linear
283
+
284
+ # Downsample
285
+
286
+ [convolutional]
287
+ batch_normalize=1
288
+ filters=512
289
+ size=3
290
+ stride=2
291
+ pad=1
292
+ activation=leaky
293
+
294
+ [convolutional]
295
+ batch_normalize=1
296
+ filters=256
297
+ size=1
298
+ stride=1
299
+ pad=1
300
+ activation=leaky
301
+
302
+ [convolutional]
303
+ batch_normalize=1
304
+ filters=512
305
+ size=3
306
+ stride=1
307
+ pad=1
308
+ activation=leaky
309
+
310
+ [shortcut]
311
+ from=-3
312
+ activation=linear
313
+
314
+
315
+ [convolutional]
316
+ batch_normalize=1
317
+ filters=256
318
+ size=1
319
+ stride=1
320
+ pad=1
321
+ activation=leaky
322
+
323
+ [convolutional]
324
+ batch_normalize=1
325
+ filters=512
326
+ size=3
327
+ stride=1
328
+ pad=1
329
+ activation=leaky
330
+
331
+ [shortcut]
332
+ from=-3
333
+ activation=linear
334
+
335
+
336
+ [convolutional]
337
+ batch_normalize=1
338
+ filters=256
339
+ size=1
340
+ stride=1
341
+ pad=1
342
+ activation=leaky
343
+
344
+ [convolutional]
345
+ batch_normalize=1
346
+ filters=512
347
+ size=3
348
+ stride=1
349
+ pad=1
350
+ activation=leaky
351
+
352
+ [shortcut]
353
+ from=-3
354
+ activation=linear
355
+
356
+
357
+ [convolutional]
358
+ batch_normalize=1
359
+ filters=256
360
+ size=1
361
+ stride=1
362
+ pad=1
363
+ activation=leaky
364
+
365
+ [convolutional]
366
+ batch_normalize=1
367
+ filters=512
368
+ size=3
369
+ stride=1
370
+ pad=1
371
+ activation=leaky
372
+
373
+ [shortcut]
374
+ from=-3
375
+ activation=linear
376
+
377
+ [convolutional]
378
+ batch_normalize=1
379
+ filters=256
380
+ size=1
381
+ stride=1
382
+ pad=1
383
+ activation=leaky
384
+
385
+ [convolutional]
386
+ batch_normalize=1
387
+ filters=512
388
+ size=3
389
+ stride=1
390
+ pad=1
391
+ activation=leaky
392
+
393
+ [shortcut]
394
+ from=-3
395
+ activation=linear
396
+
397
+
398
+ [convolutional]
399
+ batch_normalize=1
400
+ filters=256
401
+ size=1
402
+ stride=1
403
+ pad=1
404
+ activation=leaky
405
+
406
+ [convolutional]
407
+ batch_normalize=1
408
+ filters=512
409
+ size=3
410
+ stride=1
411
+ pad=1
412
+ activation=leaky
413
+
414
+ [shortcut]
415
+ from=-3
416
+ activation=linear
417
+
418
+
419
+ [convolutional]
420
+ batch_normalize=1
421
+ filters=256
422
+ size=1
423
+ stride=1
424
+ pad=1
425
+ activation=leaky
426
+
427
+ [convolutional]
428
+ batch_normalize=1
429
+ filters=512
430
+ size=3
431
+ stride=1
432
+ pad=1
433
+ activation=leaky
434
+
435
+ [shortcut]
436
+ from=-3
437
+ activation=linear
438
+
439
+ [convolutional]
440
+ batch_normalize=1
441
+ filters=256
442
+ size=1
443
+ stride=1
444
+ pad=1
445
+ activation=leaky
446
+
447
+ [convolutional]
448
+ batch_normalize=1
449
+ filters=512
450
+ size=3
451
+ stride=1
452
+ pad=1
453
+ activation=leaky
454
+
455
+ [shortcut]
456
+ from=-3
457
+ activation=linear
458
+
459
+ # Downsample
460
+
461
+ [convolutional]
462
+ batch_normalize=1
463
+ filters=1024
464
+ size=3
465
+ stride=2
466
+ pad=1
467
+ activation=leaky
468
+
469
+ [convolutional]
470
+ batch_normalize=1
471
+ filters=512
472
+ size=1
473
+ stride=1
474
+ pad=1
475
+ activation=leaky
476
+
477
+ [convolutional]
478
+ batch_normalize=1
479
+ filters=1024
480
+ size=3
481
+ stride=1
482
+ pad=1
483
+ activation=leaky
484
+
485
+ [shortcut]
486
+ from=-3
487
+ activation=linear
488
+
489
+ [convolutional]
490
+ batch_normalize=1
491
+ filters=512
492
+ size=1
493
+ stride=1
494
+ pad=1
495
+ activation=leaky
496
+
497
+ [convolutional]
498
+ batch_normalize=1
499
+ filters=1024
500
+ size=3
501
+ stride=1
502
+ pad=1
503
+ activation=leaky
504
+
505
+ [shortcut]
506
+ from=-3
507
+ activation=linear
508
+
509
+ [convolutional]
510
+ batch_normalize=1
511
+ filters=512
512
+ size=1
513
+ stride=1
514
+ pad=1
515
+ activation=leaky
516
+
517
+ [convolutional]
518
+ batch_normalize=1
519
+ filters=1024
520
+ size=3
521
+ stride=1
522
+ pad=1
523
+ activation=leaky
524
+
525
+ [shortcut]
526
+ from=-3
527
+ activation=linear
528
+
529
+ [convolutional]
530
+ batch_normalize=1
531
+ filters=512
532
+ size=1
533
+ stride=1
534
+ pad=1
535
+ activation=leaky
536
+
537
+ [convolutional]
538
+ batch_normalize=1
539
+ filters=1024
540
+ size=3
541
+ stride=1
542
+ pad=1
543
+ activation=leaky
544
+
545
+ [shortcut]
546
+ from=-3
547
+ activation=linear
548
+
549
+ ######################
550
+
551
+ [convolutional]
552
+ batch_normalize=1
553
+ filters=512
554
+ size=1
555
+ stride=1
556
+ pad=1
557
+ activation=leaky
558
+
559
+ [convolutional]
560
+ batch_normalize=1
561
+ size=3
562
+ stride=1
563
+ pad=1
564
+ filters=1024
565
+ activation=leaky
566
+
567
+ [convolutional]
568
+ batch_normalize=1
569
+ filters=512
570
+ size=1
571
+ stride=1
572
+ pad=1
573
+ activation=leaky
574
+
575
+ [convolutional]
576
+ batch_normalize=1
577
+ size=3
578
+ stride=1
579
+ pad=1
580
+ filters=1024
581
+ activation=leaky
582
+
583
+ [convolutional]
584
+ batch_normalize=1
585
+ filters=512
586
+ size=1
587
+ stride=1
588
+ pad=1
589
+ activation=leaky
590
+
591
+ [convolutional]
592
+ batch_normalize=1
593
+ size=3
594
+ stride=1
595
+ pad=1
596
+ filters=1024
597
+ activation=leaky
598
+
599
+ [convolutional]
600
+ size=1
601
+ stride=1
602
+ pad=1
603
+ filters=255
604
+ activation=linear
605
+
606
+
607
+ [yolo]
608
+ mask = 6,7,8
609
+ anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
610
+ classes=80
611
+ num=9
612
+ jitter=.3
613
+ ignore_thresh = .7
614
+ truth_thresh = 1
615
+ random=1
616
+
617
+
618
+ [route]
619
+ layers = -4
620
+
621
+ [convolutional]
622
+ batch_normalize=1
623
+ filters=256
624
+ size=1
625
+ stride=1
626
+ pad=1
627
+ activation=leaky
628
+
629
+ [upsample]
630
+ stride=2
631
+
632
+ [route]
633
+ layers = -1, 61
634
+
635
+
636
+
637
+ [convolutional]
638
+ batch_normalize=1
639
+ filters=256
640
+ size=1
641
+ stride=1
642
+ pad=1
643
+ activation=leaky
644
+
645
+ [convolutional]
646
+ batch_normalize=1
647
+ size=3
648
+ stride=1
649
+ pad=1
650
+ filters=512
651
+ activation=leaky
652
+
653
+ [convolutional]
654
+ batch_normalize=1
655
+ filters=256
656
+ size=1
657
+ stride=1
658
+ pad=1
659
+ activation=leaky
660
+
661
+ [convolutional]
662
+ batch_normalize=1
663
+ size=3
664
+ stride=1
665
+ pad=1
666
+ filters=512
667
+ activation=leaky
668
+
669
+ [convolutional]
670
+ batch_normalize=1
671
+ filters=256
672
+ size=1
673
+ stride=1
674
+ pad=1
675
+ activation=leaky
676
+
677
+ [convolutional]
678
+ batch_normalize=1
679
+ size=3
680
+ stride=1
681
+ pad=1
682
+ filters=512
683
+ activation=leaky
684
+
685
+ [convolutional]
686
+ size=1
687
+ stride=1
688
+ pad=1
689
+ filters=255
690
+ activation=linear
691
+
692
+
693
+ [yolo]
694
+ mask = 3,4,5
695
+ anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
696
+ classes=80
697
+ num=9
698
+ jitter=.3
699
+ ignore_thresh = .7
700
+ truth_thresh = 1
701
+ random=1
702
+
703
+
704
+
705
+ [route]
706
+ layers = -4
707
+
708
+ [convolutional]
709
+ batch_normalize=1
710
+ filters=128
711
+ size=1
712
+ stride=1
713
+ pad=1
714
+ activation=leaky
715
+
716
+ [upsample]
717
+ stride=2
718
+
719
+ [route]
720
+ layers = -1, 36
721
+
722
+
723
+
724
+ [convolutional]
725
+ batch_normalize=1
726
+ filters=128
727
+ size=1
728
+ stride=1
729
+ pad=1
730
+ activation=leaky
731
+
732
+ [convolutional]
733
+ batch_normalize=1
734
+ size=3
735
+ stride=1
736
+ pad=1
737
+ filters=256
738
+ activation=leaky
739
+
740
+ [convolutional]
741
+ batch_normalize=1
742
+ filters=128
743
+ size=1
744
+ stride=1
745
+ pad=1
746
+ activation=leaky
747
+
748
+ [convolutional]
749
+ batch_normalize=1
750
+ size=3
751
+ stride=1
752
+ pad=1
753
+ filters=256
754
+ activation=leaky
755
+
756
+ [convolutional]
757
+ batch_normalize=1
758
+ filters=128
759
+ size=1
760
+ stride=1
761
+ pad=1
762
+ activation=leaky
763
+
764
+ [convolutional]
765
+ batch_normalize=1
766
+ size=3
767
+ stride=1
768
+ pad=1
769
+ filters=256
770
+ activation=leaky
771
+
772
+ [convolutional]
773
+ size=1
774
+ stride=1
775
+ pad=1
776
+ filters=255
777
+ activation=linear
778
+
779
+
780
+ [yolo]
781
+ mask = 0,1,2
782
+ anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
783
+ classes=80
784
+ num=9
785
+ jitter=.3
786
+ ignore_thresh = .7
787
+ truth_thresh = 1
788
+ random=1