UnityGiles commited on
Commit
712055b
1 Parent(s): 6394d93

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: unity-sentis
4
+ pipeline_tag: object-detection
5
+ ---
6
+
7
+ # BlazeFace in Sentis
8
+
9
+ BlazeFace is a fast, light-weight face detector from Google Research. A pretrained model is available as part of Google's [MediaPipe](https://ai.google.dev/edge/mediapipe/solutions/vision/face_detector) framework.
10
+
11
+ ![](../images/face.jpg)
12
+
13
+ The BlazeFace model has been converted from TFLite to ONNX for use in Sentis using [tf2onnx](https://github.com/onnx/tensorflow-onnx) with the default export parameters.
14
+
15
+ ## Functional API
16
+
17
+ The BlazeFace model takes a (1, 128, 128, 3) input image tensor and outputs a (1, 896, 16) boxes tensor and a (1, 896, 1) scores tensor.
18
+
19
+ Each of the 896 boxes consists of:
20
+ - [x position, y position, width, height] for the bounding box. The position is relative to the anchor position for the given index, these are precalculated and loaded from a csv file.
21
+ - [x position, y position] for each of 6 facial keypoints relative to the anchor position.
22
+
23
+ We adapt the model using the Sentis functional API to apply non maximum suppression to filter the boxes with the highest scores that don't overlap with each other.
24
+ ```
25
+ var xCenter = rawBoxes[0, .., 0] + anchors[.., 0] * inputSize;
26
+ var yCenter = rawBoxes[0, .., 1] + anchors[.., 1] * inputSize;
27
+
28
+ var widthHalf = 0.5f * rawBoxes[0, .., 2];
29
+ var heightHalf = 0.5f * rawBoxes[0, .., 3];
30
+
31
+ var nmsBoxes = Functional.Stack(new[]
32
+ {
33
+ yCenter - heightHalf,
34
+ xCenter - widthHalf,
35
+ yCenter + heightHalf,
36
+ xCenter + widthHalf
37
+ }, 1);
38
+
39
+ var nmsScores = Functional.Squeeze(ScoreFiltering(rawScores, 100f));
40
+ var selectedIndices = Functional.NMS(nmsBoxes, nmsScores, iouThreshold, scoreThreshold); // (N);
41
+
42
+ var selectedBoxes = Functional.IndexSelect(rawBoxes, 1, selectedIndices).Unsqueeze(0); // (1, N, 16)
43
+ var selectedScores = Functional.IndexSelect(rawScores, 1, selectedIndices).Unsqueeze(0); // (1, N, 1)
44
+ ```
45
+
46
+ ## Model inference
47
+
48
+ We use the dimensions of the texture to set up an affine transformation matrix to go from the 128x128 tensor coordinates to the image coordinates. We then fill the input tensor using a compute shader with this affine transformation, points outside the image will correspond to zeros in the input tensor.
49
+ ```
50
+ var size = Mathf.Max(texture.width, texture.height);
51
+
52
+ // The affine transformation matrix to go from tensor coordinates to image coordinates
53
+ var scale = size / (float)detectorInputSize;
54
+ var M = BlazeUtils.mul(BlazeUtils.TranslationMatrix(0.5f * (new Vector2(texture.width, texture.height) + new Vector2(-size, size))), BlazeUtils.ScaleMatrix(new Vector2(scale, -scale)));
55
+ BlazeUtils.SampleImageAffine(texture, m_DetectorInput, M);
56
+
57
+ m_FaceDetectorWorker.Schedule(m_DetectorInput);
58
+ ```
59
+
60
+ Execution is scheduled using an [Awaitable](https://docs.unity3d.com/6000.0/Documentation/ScriptReference/Awaitable.html) and the output tensors are downloaded and awaited. This frees up the main thread while the GPU computation and download takes place.
61
+ ```
62
+ var outputIndicesAwaitable = (m_FaceDetectorWorker.PeekOutput(0) as Tensor<int>).ReadbackAndCloneAsync();
63
+ var outputScoresAwaitable = (m_FaceDetectorWorker.PeekOutput(1) as Tensor<float>).ReadbackAndCloneAsync();
64
+ var outputBoxesAwaitable = (m_FaceDetectorWorker.PeekOutput(2) as Tensor<float>).ReadbackAndCloneAsync();
65
+
66
+ using var outputIndices = await outputIndicesAwaitable;
67
+ using var outputScores = await outputScoresAwaitable;
68
+ using var outputBoxes = await outputBoxesAwaitable;
69
+ ```
70
+ The output tensors are now on the CPU and can be read. We use the values from the output tensors together with the affine transformation matrix to set the transforms on the bounding boxes and keypoints for visualization.
71
+
72
+ In this demo we visualize the four faces with the highest scores that pass the score threshold.
73
+
74
+ ## Notes
75
+ This model has been trained primarily for short-range faces in images taken using the front-facing smartphone camera, results may be poor for longer-range images of faces.
76
+
77
+ The non max suppression operator requires a blocking GPU readback, this prevents this demo from running on the WebGPU backend in Unity 6 and Sentis 2.0.