Matthijs Hollemans commited on
Commit
cdc6976
1 Parent(s): 3caac63

add basic usage instructions

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md CHANGED
@@ -13,3 +13,52 @@ datasets:
13
  Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in [this repository](https://github.com/google-research/vision_transformer). However, the weights were converted from the [timm repository](https://github.com/rwightman/pytorch-image-models) by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.
14
 
15
  This repo contains a Core ML version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in [this repository](https://github.com/google-research/vision_transformer). However, the weights were converted from the [timm repository](https://github.com/rwightman/pytorch-image-models) by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.
14
 
15
  This repo contains a Core ML version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224).
16
+
17
+ ## Usage instructions
18
+
19
+ Create a `VNCoreMLRequest` that loads the ViT model:
20
+
21
+ ```swift
22
+ import CoreML
23
+ import Vision
24
+
25
+ lazy var classificationRequest: VNCoreMLRequest = {
26
+ do {
27
+ let config = MLModelConfiguration()
28
+ config.computeUnits = .all
29
+ let coreMLModel = try ViT(configuration: config)
30
+ let visionModel = try VNCoreMLModel(for: coreMLModel.model)
31
+
32
+ let request = VNCoreMLRequest(model: visionModel, completionHandler: { [weak self] request, error in
33
+ if let results = request.results as? [VNClassificationObservation] {
34
+ /* do something with the results */
35
+ }
36
+ })
37
+
38
+ request.imageCropAndScaleOption = .centerCrop
39
+ return request
40
+ } catch {
41
+ fatalError("Failed to create VNCoreMLModel: \(error)")
42
+ }
43
+ }()
44
+ ```
45
+
46
+ Perform the request:
47
+
48
+ ```swift
49
+ func classify(image: UIImage) {
50
+ guard let ciImage = CIImage(image: image) else {
51
+ print("Unable to create CIImage")
52
+ return
53
+ }
54
+
55
+ DispatchQueue.global(qos: .userInitiated).async {
56
+ let handler = VNImageRequestHandler(ciImage: ciImage, orientation: .up)
57
+ do {
58
+ try handler.perform([self.classificationRequest])
59
+ } catch {
60
+ print("Failed to perform classification: \(error)")
61
+ }
62
+ }
63
+ }
64
+ ```