ViTGaze 👀

Gaze Following with Interaction Features in Vision Transformers

[Yuehao Song](https://scholar.google.com/citations?user=7sqkA-MAAAAJ)¹ , [Xinggang Wang](https://xwcv.github.io)^1,✉️ , [Jingfeng Yao](https://scholar.google.com/citations?user=4qc1qJ0AAAAJ)¹ , [Wenyu Liu](http://eic.hust.edu.cn/professor/liuwenyu/)¹ , Jinglin Zhang² , Xiangmin Xu³ ¹ Huazhong University of Science and Technology, ² Shandong University, ³ South China University of Technology (^✉️ corresponding author) Accepted by Visual Intelligence ([Paper](https://link.springer.com/article/10.1007/s44267-024-00064-9)) [![arxiv paper](https://img.shields.io/badge/arXiv-Preprint-red)](https://arxiv.org/abs/2403.12778) [![Github](https://img.shields.io/badge/Github-Code-gren)](https://github.com/hustvl/ViTGaze) [![PaperwithCode](https://img.shields.io/badge/Paperswithcode-blue)](https://paperswithcode.com/paper/vitgaze-gaze-following-with-interaction) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vitgaze-gaze-following-with-interaction/gaze-target-estimation-on-gazefollow)](https://paperswithcode.com/sota/gaze-target-estimation-on-gazefollow?p=vitgaze-gaze-following-with-interaction) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vitgaze-gaze-following-with-interaction/gaze-target-estimation-on)](https://paperswithcode.com/sota/gaze-target-estimation-on?p=vitgaze-gaze-following-with-interaction)

Results on GazeFollow	Results on VideoAttentionTarget
AUC	Avg. Dist.	Min. Dist.	AUC	Dist.	AP
0.949	0.105	0.047	0.938	0.102	0.905

Results on GazeFollow

Results on VideoAttentionTarget

AUC

Avg. Dist.

Min. Dist.

AUC

Dist.

0.949

0.105

0.047

0.938

0.102

0.905

ViTGaze 👀

Gaze Following with Interaction Features in Vision Transformers

Plain Vision Transformer could also do gaze following with the simple ViTGaze framework!