ACT-Estimator
This model is designed for use with ACT-Bench to compute evaluation metrics. It serves as a prediction model that reconstructs trajectories from driving videos generated by autonomous driving world models. The predicted trajectory is compared against the instruction trajectory, which serves as the reference, to calculate the evaluation metrics for ACT-Bench.
Model Summary
- Developed by: Turing Inc.
- Licence: CC-BY-NC-SA-4.0
- Model Size: 20.4M
Model Date
ACT-Estimator
was trained on November 2024.
Model I/O
Input
generated_videos
: Generated driving videos (shape: (batch_size, 3, 44, 224, 224))timestamps
: Timestamps of the generated videos (shape: (batch_size, 44))
Output
command
: Logits of 9 classes (shape: (batch_size, 9))waypoints
: Estimated waypoints (shape: (batch_size, 44, 2))
Command Classes
The command
output represents the predicted high-level driving action.
The output is a logits of 9 classes as follows:
LABELS = [
"curving_to_left", # command = 0
"curving_to_right", # command = 1
"straight_constant_high_speed", # command = 2
"straight_constant_low_speed", # command = 3
"straight_accelerating", # command = 4
"straight_decelerating", # command = 5
"starting", # command = 6
"stopping", # command = 7
"stopped", # command = 8
]
Waypoint Coordinate System
The waypoints
output consists of 44 waypoints, each representing the vehicle's position at a specific timestamp.
Each waypoint is represented as a 2D vector (x, y)
in a 2D Cartesian coordinate system.
- The origin
(0, 0)
is defined as the initial position of the vehicle at the start of the video. - The
x
-axis corresponds to the lateral direction of the vehicle, with positive values indicating movement to the right. - The
y
-axis corresponds to the forward direction of the vehicle, with positive values indicating forward movement.
Training Dataset
Authors
Here are the team members who contributed to the development of ACT-Bench
and ACT-Estimator
:
- Hidehisa Arai
- Keishi Ishihara
- Tsubasa Takahashi
- Yu Yamaguchi
How to use
import torch
from transformers import AutoModel
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
act_estimator = AutoModel.from_pretrained("turing-motors/Act-Estimator", trust_remote_code=True)
act_estimator.to(device)
act_estimator.eval()
# dummy inputs
generated_videos = torch.randn(1, 3, 44, 224, 224).to(device)
timestamps = torch.randn(1, 44).to(device)
out = act_estimator(generated_videos, timestamps)
print(out.keys())
# dict_keys(['command', 'waypoints'])
print(out["command"].size()) # torch.Size([1, 9])
print(out["waypoints"].size()) # torch.Size([1, 44, 2])
License
The ACT-Estimator is licensed under the CC-BY-NC-SA-4.0.
Citation
If you find our work helpful, please feel free to cite us.
@misc{arai2024actbench,
title={ACT-Bench: Towards Action Controllable World Models for Autonomous Driving},
author={Hidehisa Arai and Keishi Ishihara and Tsubasa Takahashi and Yu Yamaguchi},
year={2024},
eprint={2412.05337},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.05337},
}
- Downloads last month
- 66