MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
Abstract
As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information of targets effectively. Unfortunately, most existing methods only explicitly exploit the object features between adjacent frames, while lacking the capacity to model long-term <PRE_TAG>temporal information</POST_TAG>. In this paper, we propose Me<PRE_TAG>MOTR</POST_TAG>, a long-term memory-augmented Transformer for multi-object tracking. Our method is able to make the same object's track embedding more stable and distinguishable by leveraging long-term memory injection with a customized memory-attention layer. This significantly improves the target association ability of our model. Experimental results on DanceTrack show that Me<PRE_TAG>MOTR</POST_TAG> impressively surpasses the state-of-the-art method by 7.9% and 13.0% on HOTA and AssA metrics, respectively. Furthermore, our model also outperforms other Transformer-based methods on association performance on <PRE_TAG>MOT17</POST_TAG> and generalizes well on BDD100K. Code is available at https://github.com/MCG-NJU/Me<PRE_TAG>MOTR</POST_TAG>.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper