CoTracker
vision
File size: 2,091 Bytes
e054c49
 
d066182
 
 
e054c49
d066182
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f80fc95
2c85b33
d066182
 
 
 
2c85b33
d066182
 
 
 
2c85b33
 
 
d066182
 
 
 
 
 
 
2c85b33
d066182
2c85b33
 
 
 
d066182
 
2c85b33
 
d066182
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
license: cc-by-nc-4.0
tags:
- vision
- cotracker
---
# Point tracking with CoTracker



**CoTracker** is a fast transformer-based model that was introduced in [CoTracker: It is Better to Track Together](https://arxiv.org/abs/2307.07635).
It can track any point in a video and brings to tracking some of the benefits of Optical Flow.

CoTracker can track:

- **Any pixel** in a video
- A **quasi-dense** set of pixels together
- Points can be manually selected or sampled on a grid in any video frame


## How to use
Here is how to use this model in the **offline mode**:

```pip install imageio[ffmpeg]```, then:
```python
import torch
# Download the video
url = 'https://github.com/facebookresearch/co-tracker/blob/main/assets/apple.mp4'

import imageio.v3 as iio
frames = iio.imread(url, plugin="FFMPEG")  # plugin="pyav"

device = 'cuda'
grid_size = 10
video = torch.tensor(frames).permute(0, 3, 1, 2)[None].float().to(device)  # B T C H W

# Run Offline CoTracker:
cotracker = torch.hub.load("facebookresearch/co-tracker", "cotracker2").to(device)
pred_tracks, pred_visibility = cotracker(video, grid_size=grid_size) # B T N 2,  B T N 1
```
and in the **online mode**:
```python
cotracker = torch.hub.load("facebookresearch/co-tracker", "cotracker2_online").to(device)

# Run Online CoTracker, the same model with a different API:
# Initialize online processing
cotracker(video_chunk=video, is_first_step=True, grid_size=grid_size)  

# Process the video
for ind in range(0, video.shape[1] - cotracker.step, cotracker.step):
    pred_tracks, pred_visibility = cotracker(
        video_chunk=video[:, ind : ind + cotracker.step * 2]
    )  # B T N 2,  B T N 1
```
Online processing is more memory-efficient and allows for the processing of longer videos or videos in real-time.

## BibTeX entry and citation info

```bibtex
@article{karaev2023cotracker,
  title={CoTracker: It is Better to Track Together},
  author={Nikita Karaev and Ignacio Rocco and Benjamin Graham and Natalia Neverova and Andrea Vedaldi and Christian Rupprecht},
  journal={arXiv:2307.07635},
  year={2023}
}
```