|
--- |
|
license: cc-by-nc-sa-4.0 |
|
--- |
|
|
|
# ACT-Estimator |
|
|
|
This model is designed for use with [ACT-Bench](https://github.com/turingmotors/ACT-Bench) to compute evaluation metrics. It serves as a prediction model that reconstructs trajectories from driving videos generated by autonomous driving world models. |
|
The predicted trajectory is compared against the instruction trajectory, which serves as the reference, to calculate the evaluation metrics for ACT-Bench. |
|
|
|
|
|
## Model Summary |
|
|
|
- Developed by: Turing Inc. |
|
- Licence: CC-BY-NC-SA-4.0 |
|
- Model Size: 20.4M |
|
|
|
|
|
## Model Date |
|
|
|
`ACT-Estimator` was trained on November 2024. |
|
|
|
|
|
## Model I/O |
|
|
|
Input |
|
|
|
- `generated_videos`: Generated driving videos (shape: (batch_size, 3, 44, 224, 224)) |
|
- `timestamps`: Timestamps of the generated videos (shape: (batch_size, 44)) |
|
|
|
Output |
|
|
|
- `command`: Logits of 9 classes (shape: (batch_size, 9)) |
|
- `waypoints`: Estimated waypoints (shape: (batch_size, 44, 2)) |
|
|
|
### Command Classes |
|
|
|
The `command` output represents the predicted high-level driving action. |
|
The output is a logits of 9 classes as follows: |
|
|
|
```python |
|
LABELS = [ |
|
"curving_to_left", # command = 0 |
|
"curving_to_right", # command = 1 |
|
"straight_constant_high_speed", # command = 2 |
|
"straight_constant_low_speed", # command = 3 |
|
"straight_accelerating", # command = 4 |
|
"straight_decelerating", # command = 5 |
|
"starting", # command = 6 |
|
"stopping", # command = 7 |
|
"stopped", # command = 8 |
|
] |
|
``` |
|
|
|
|
|
### Waypoint Coordinate System |
|
|
|
The `waypoints` output consists of 44 waypoints, each representing the vehicle's position at a specific timestamp. |
|
Each waypoint is represented as a 2D vector `(x, y)` in a 2D Cartesian coordinate system. |
|
|
|
- The origin `(0, 0)` is defined as the initial position of the vehicle at the start of the video. |
|
- The `x`-axis corresponds to the lateral direction of the vehicle, with positive values indicating movement to the right. |
|
- The `y`-axis corresponds to the forward direction of the vehicle, with positive values indicating forward movement. |
|
|
|
|
|
## Training Dataset |
|
|
|
- Video frames-trajectory pairs from [nuScenes](https://www.nuscenes.org/) dataset. Details are described in our [paper](). |
|
|
|
|
|
## Authors |
|
|
|
Here are the team members who contributed to the development of `ACT-Bench` and `ACT-Estimator`: |
|
|
|
- Hidehisa Arai |
|
- Keishi Ishihara |
|
- Tsubasa Takahashi |
|
- Yu Yamaguchi |
|
|
|
|
|
## How to use |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModel |
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
|
act_estimator = AutoModel.from_pretrained("turing-motors/Act-Estimator", trust_remote_code=True) |
|
act_estimator.to(device) |
|
act_estimator.eval() |
|
|
|
# dummy inputs |
|
generated_videos = torch.randn(1, 3, 44, 224, 224).to(device) |
|
timestamps = torch.randn(1, 44).to(device) |
|
|
|
out = act_estimator(generated_videos, timestamps) |
|
print(out.keys()) |
|
# dict_keys(['command', 'waypoints']) |
|
|
|
print(out["command"].size()) # torch.Size([1, 9]) |
|
print(out["waypoints"].size()) # torch.Size([1, 44, 2]) |
|
``` |
|
|
|
|
|
## License |
|
|
|
The ACT-Estimator is licensed under the CC-BY-NC-SA-4.0. |
|
|
|
|
|
## Citation |
|
|
|
If you find our work helpful, please feel free to cite us. |
|
|
|
``` |
|
@misc{arai2024actbench, |
|
title={ACT-Bench: Towards Action Controllable World Models for Autonomous Driving}, |
|
author={Hidehisa Arai and Keishi Ishihara and Tsubasa Takahashi and Yu Yamaguchi}, |
|
year={2024}, |
|
eprint={2412.05337}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2412.05337}, |
|
} |
|
``` |