---
license: apache-2.0
pipeline_tag: image-to-video
tags:
- autonomous driving
- video generation
- world model
---

# Model Card for Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

![](card.gif)

## Brief Introduction

**Vista** is a generalizable driving world model that is capable of:
- **High-Fidelity Future Prediction:** *Predict high-fidelity futures in various scenarios*.
- **Coherent Long-Horizon Rollout:** *Extend its predictions to continuous and long horizons*.
- **Versatile Action Controllability:** *Execute multi-modal actions (steering angles, speeds, commands, trajectories, goal points)*.
- **Generalizable Reward Function:** *Provide rewards for different actions without accessing ground truth actions*.

## Related Links

For more technical details and discussions, please refer to:
- **Paper:** https://arxiv.org/abs/2405.17398
- **Code:** https://github.com/OpenDriveLab/Vista
- **Demo:** https://vista-demo.github.io

## How to Use

Check out https://github.com/OpenDriveLab/Vista

## Citation

```bibtex
@article{gao2024vista,
 title={Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability}, 
 author={Shenyuan Gao and Jiazhi Yang and Li Chen and Kashyap Chitta and Yihang Qiu and Andreas Geiger and Jun Zhang and Hongyang Li},
 journal={arXiv preprint arXiv:2405.17398},
 year={2024}
}

@inproceedings{yang2024genad,
  title={Generalized Predictive Model for Autonomous Driving},
  author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}
```

## Contact

If you have any questions or comments, feel free to leave a message to sygao@connect.ust.hk