File size: 3,597 Bytes
f0e3620
43252d8
f0e3620
 
 
 
 
 
 
 
 
 
 
76af299
f0e3620
 
 
 
 
 
 
 
 
 
 
8069f6a
 
 
f0e3620
 
8069f6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0e3620
 
 
 
 
 
 
 
8069f6a
 
f0e3620
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8069f6a
f0e3620
8069f6a
 
 
f0e3620
 
 
 
 
76af299
f0e3620
 
 
 
 
 
 
5fc2e07
 
 
 
 
 
 
 
 
43252d8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
license: cc-by-nc-sa-4.0
---

# ACT-Estimator

This model is designed for use with [ACT-Bench](https://github.com/turingmotors/ACT-Bench) to compute evaluation metrics. It serves as a prediction model that reconstructs trajectories from driving videos generated by autonomous driving world models.
The predicted trajectory is compared against the instruction trajectory, which serves as the reference, to calculate the evaluation metrics for ACT-Bench.


## Model Summary

- Developed by: Turing Inc.
- Licence: CC-BY-NC-SA-4.0
- Model Size: 20.4M


## Model Date

`ACT-Estimator` was trained on November 2024.


## Model I/O

Input

- `generated_videos`: Generated driving videos (shape: (batch_size, 3, 44, 224, 224))
- `timestamps`: Timestamps of the generated videos (shape: (batch_size, 44))

Output

- `command`: Logits of 9 classes (shape: (batch_size, 9))
- `waypoints`: Estimated waypoints (shape: (batch_size, 44, 2))

### Command Classes

The `command` output represents the predicted high-level driving action.
The output is a logits of 9 classes as follows:

```python
LABELS = [
    "curving_to_left",              # command = 0
    "curving_to_right",             # command = 1
    "straight_constant_high_speed", # command = 2
    "straight_constant_low_speed",  # command = 3
    "straight_accelerating",        # command = 4
    "straight_decelerating",        # command = 5
    "starting",                     # command = 6
    "stopping",                     # command = 7
    "stopped",                      # command = 8
]
```


### Waypoint Coordinate System

The `waypoints` output consists of 44 waypoints, each representing the vehicle's position at a specific timestamp.
Each waypoint is represented as a 2D vector `(x, y)` in a 2D Cartesian coordinate system.

- The origin `(0, 0)` is defined as the initial position of the vehicle at the start of the video.
- The `x`-axis corresponds to the lateral direction of the vehicle, with positive values indicating movement to the right.
- The `y`-axis corresponds to the forward direction of the vehicle, with positive values indicating forward movement.


## Training Dataset

- Video frames-trajectory pairs from [nuScenes](https://www.nuscenes.org/) dataset. Details are described in our [paper]().


## Authors

Here are the team members who contributed to the development of `ACT-Bench` and `ACT-Estimator`:

- Hidehisa Arai
- Keishi Ishihara
- Tsubasa Takahashi
- Yu Yamaguchi


## How to use

```python
import torch
from transformers import AutoModel

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

act_estimator = AutoModel.from_pretrained("turing-motors/Act-Estimator", trust_remote_code=True)
act_estimator.to(device)
act_estimator.eval()

# dummy inputs
generated_videos = torch.randn(1, 3, 44, 224, 224).to(device)
timestamps = torch.randn(1, 44).to(device)

out = act_estimator(generated_videos, timestamps)
print(out.keys())
# dict_keys(['command', 'waypoints'])

print(out["command"].size()) # torch.Size([1, 9])
print(out["waypoints"].size()) # torch.Size([1, 44, 2])
```


## License

The ACT-Estimator is licensed under the CC-BY-NC-SA-4.0.


## Citation

If you find our work helpful, please feel free to cite us.

```
@misc{arai2024actbench,
      title={ACT-Bench: Towards Action Controllable World Models for Autonomous Driving},
      author={Hidehisa Arai and Keishi Ishihara and Tsubasa Takahashi and Yu Yamaguchi},
      year={2024},
      eprint={2412.05337},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.05337},
}
```