Joint Feature Correspondences and Appearance Similarity for Robust Visual Object Tracking
Journal article, 2010
A novel visual object tracking scheme is proposed by using joint point feature correspondences and object appearance similarity. For point feature-based tracking, we propose a candidate tracker that simultaneously exploits two separate sets of point feature correspondences in the foreground and in the surrounding background, where background features are exploited for the indication of occlusions. Feature points in these two sets are then dynamically maintained. For object appearance-based tracking, we propose a candidate tracker based on an enhanced anisotropic mean shift with a fully tunable (5 degrees of freedom) bounding box that is partially guided by the above feature point tracker. Both candidate trackers contain a re-initialization process to reset the tracker in order to prevent accumulated tracking error propagation in frames. In addition, a novel online learning method is introduced to the enhanced mean shift-based candidate tracker. The reference object distribution is updated in each time interval if there is an indication of stable and reliable tracking without background interferences. By dynamically updating the reference object model, tracking is further improved by using a more accurate object appearance similarity measure. An optimal selection criterion is applied to the final tracker based on the results of these candidate trackers. Experiments have been conducted on several videos containing a range of complex scenarios. To evaluate the performance, the proposed scheme is further evaluated using three objective criteria, and compared with two existing trackers. All our results have shown that the proposed scheme is very robust and has yielded a marked improvement in terms of tracking drift, tightness and accuracy of tracked bounding boxes, especially for complex video scenarios containing long-term partial occlusions or intersections, deformation, or background clutter with similar color distributions to the foreground object.
RANSAC
object appearance model
consensus point feature correspondences
online learning of reference object
SIFT descriptor
dynamic maintenance
Visual object tracking
enhanced anisotropic mean shift