The problem of multi-object tracking, a.k.a MOT, is detecting multiple objects in consecutive video frames and properly tracking them in the whole video. Nowadays, MOT is widely used in various applications such as autonomous vehicles, surveillance cameras, and alerting systems, where the high accuracy in detection and precisely tracking them without losing them among the frames is a key performance metric.
In MOT, object detection and then associating the data between consecutive frames are the two main steps. In many current models, these two tasks are designed separately and then performed in the final model. Such a model can bring up challenges for optimizing the whole integrated task, as the optimal solutions are considered for separate tasks independently, and these two submodules do not communicate well for optimization. Therefore, some recent studies have focused on performing these two stages in one designed model altogether. To this end, graph neural networks can provide appropriate solutions, for, by modeling data in graphs, temporal and spatial relations can be effectively exploited in these steps. Using this structure can help build an integrated model and design and optimize the model altogether in object detection and association stages.
In this research, we propose a model capable of merging the two tasks of detection and association into one whole model using graph neural networks to present a more efficient and optimal method.
We will evaluate the proposed method on MOT16, MOT17, MOT20, ETHZ, and Caltech datasets. Then the results will be reported by measures including CLEAR MOT (MOTP and MOTA), ID scores, False Trajectories, Fragments, MT(Mostly Tracked), ML (Mostly Lost), and IDS (ID Switches) and will be compared with the well-known existing models